In my last article, I talk about WHY business analysts, data analysts, data scientists, and data engineers are writing code. As we discovered, in their mission to find and extract valuable insights that the business can act quickly on, this cohort spends most of its time building ad-hoc software to process the data, most frequently using the “data pipelines†pattern.Many enterprises are making major investments in the development of a custom data pipelines platform that could be available commercially.But is writing redundant software that massages data really the best way for these specialists to create value for the business? If you ask the working individuals in these professions, you will get diverse answers that sound something like this:
- NO. "We deeply know data, math, analysis, statistics, machine learning, AI, and are experts in using math-based software packages to crunch data into quantitative insights. Unfortunately the professional development of enterprise-grade software at scale is a different world that demands different skillsets and different practices. In the meantime, we copy code from online forums and try our best."
- YES. "We are enterprise software developers who know how to build scaled software systems and have core expertise in many of the cloud and open-source building blocks out there. In contrast, data is easy, and we have picked up enough about data science along the way to create the data assets our business needs. We have got this."
- GOOD ENOUGH. "We know our business’ data well and are familiar with enough basic programming in Python, Javascript. We also hack our enterprise BI tools to run some basic algorithms, they are inefficient but create datasets that meet our basic needs. We could use some help, though."
While your data team likely includes people with all three of these points of view, what really matters is the position of the leaders, and the pace with which the team is adapting to the real needs of the business. So while we're rolling the dice with the alphas, let's take a moment to look at the two sources of value in this context: data and code.
Creating value with data
Most practitioners agree that the value to the business lies in the data, and that the work to be done is to extract actionable insights from it. Let's look at three groups of key requirements for creating value with data. Foremost, operate at speed:
- The speed with which new ideas and hypotheses can appear.
- The speed with which an idea can be turned into a working pipeline.
- The speed with which the data can be turned into insight and action.
Then, focus on good insights:
- Increase the rate at which good insight are detected and built out.
- Increase the speed at which good insight are launched as pipelines.
- Increase the payoff of good pipelines and action they lead to.
- Improve the robustness of good pipelines to provide continuity.
Finally, reduce the TCO of data:
- Reduce the cost of acquiring and maintaining new data sources.
- Reduce the cost at which poor hypotheses are detected and dismissed.
- Reduce the cost of creating individual pipelines.
- Reduce the cost of holding data.