Picture a data engineering leader, like a captain at the helm of a ship, navigating the vast ocean of data. The modern data stack promised a vessel with cutting-edge navigation tools to sail smoothly through this ocean.
But instead, they find themselves on a ship held together by a patchwork of disjointed tools, navigating through treacherous waters, beset by the storms of skills shortages, and weighed down by a seemingly ever-increasing time to value.
Can you relate to this feeling of being adrift in a sea of complexity? It’s time for an upgrade. Enter the post-modern data stack.
In this post, we’ll explain:
- What the post-modern data stack is
- How it differs from the modern data stack
- The potential value it has for today’s data-driven organizations
Why is the modern data stack evolving towards the post-modern data stack?
As we discuss in detail in The Hidden Challenges of the Modern Data Stack, the modern data stack is Stage 3 of the pattern we call “the innovation cycle.” Let’s review it briefly.
Whenever a new technology frontier appears (Stage 1 of the innovation cycle), the “Innovators” (coined by Geoffrey A. Moore in his book, Crossing the Chasm) rush to build and launch hundreds of new tools to the market (Stage 2). In the data world, this disruption manifested in the form of cloud computing with technologies such as Redshift, Snowflake, and Spark. And the innovators were the point solutions created to leverage its intrinsic capacity and scalability.
Stage 3 begins as these early adopters collaborate formally and informally, identifying and documenting best practices and patterns in the form of “reference architectures”. In our case, data ingestion, transformation, orchestration, reverse ETL, and observability. Sounds familiar? This is the modern data stack as we know it today.
The modern data stack has become disjointed and complex, slowing data engineering’s productivity and limiting their ability to provide value to the business. At this stage of the innovation cycle, engineers are in dire need of a solution that will give them broader control of their data pipelines and automate the manual activities holding them back.
That’s where the post-modern data stack — stage 4 of the innovation cycle — comes into play.
What is the post-modern data stack?
The post-modern data stack has at its foundation a single platform for building end-to-end data pipelines that automate the management of change for developers — ultimately eliminating lengthy impact analysis for every change and driving unprecedented efficiency as a result.
And it’s been a long time coming. For too long, organizations have grappled with the limitations of the modern data stack. They’ve been struggling to extract value from data, hiring expensive specialists, increasing costs, and complicating change management in the process.
The data pipeline automation inherent to the post-modern data stack eliminates the obstacles that become engrained in the piecewise modern tech stack, and enables the possibility of intelligent data pipelines with which engineers build, modify, and revise at breakneck speeds.
We see this pattern play out every day. For example, a global media company struggled because they were juggling different tools like Fivetran for bringing in data, dbt for transforming it, Airflow for coordinating everything, Monte Carlo for monitoring and scanning for troubled data, and Hightouch for getting data out to other systems. Building data pipelines across this landscape became a nightmare, as each tool came with its own interface, pricing model, runtime, and cost of ownership structure. Integrating all these tools compounded the complexity, and major gaps like the lack of data lineage became intractable.
By embracing the post-modern data stack, this media company streamlined its data pipeline management with end-to-end automation and a single pane of glass. Now their data engineering team can deliver new pipelines and business results in a matter of days rather than months. This remarkable acceleration not only enhances collaboration with stakeholders, but also drives a substantial impact on efficiency and, in the case of this media company, on revenue generation. The post-modern data stack has transformed the data engineering landscape there, enabling the data teams to extract value from their data assets with less friction and greater agility.
What are the key characteristics of the post-modern data stack?
The post-modern data stack revolutionizes data engineering by focusing on two key characteristics: data stack convergence and repetitive tasks automation. Let’s get into what this means for you.
Data Stack Convergence
Data stack convergence signifies a paradigm shift in data engineering. It is an evolution from the disjointed, multi-tool approach of the modern data stack towards a unified, single-platform system that manages the entirety of the data pipeline end-to-end.
Let’s be more specific: convergence is different from integration, where in-house teams bridge gaps between tools with bespoke scripts and code. In fact, integration is a hallmark of the modern data stack. Convergence is also different from consolidation, where vendors merge their offerings, sometimes internalizing the integration between the tools, but never actually merging their functionality into a next-generation platform. Finally, convergence is also different from bundling, which is usually a commercial arrangement that doesn’t solve the technical gaps between the tools in the bundle at all. In rare cases, integrators do offer bundles of pre-integrated tools as a managed service, but the tools in the bundle remain the same, and retain the pains and problems of the modern data stack.
Picture this analogy: The modern data stack looks like a construction site bustling with numerous workers, each with their own skills, tools, and tasks being managed by foremen dedicated to each trade. A converged platform however looks more like a unified production line efficiently and seamlessly handling every step in the manufacturing process from start to finish. A converged, post-modern data stack is a streamlined, efficient engine that includes ingestion, transformation, orchestration, observability, and reverse ETL, all on one platform.
The holistic approach of the post-modern data stack translates into numerous benefits:
- First, it accelerates pinpointing and troubleshooting pipeline hotspots with a single console that observes the entire data pipeline and all its processes. When issues arise, there’s no need to mine and transfer diagnostic metadata or switch between tools and interfaces. All the information you need is at your fingertips, and engineers can spring into action in minutes.
- Second, it enhances governance and security. With a single platform and rich RBAC controls, data stewards and engineers can maintain control over access, monitor usage, and enforce security policies across every aspect of your enterprise data flows. The single platform enforces the controls for all your gatekeepers, guarding your data workflows across all your data cloud accounts.
- Lastly, it improves efficiency and scalability. By eliminating the need for different tools at each stage of your pipeline workflows, your data engineers can focus on harnessing unparalleled visibility into resource utilization to optimize your pipelines rather than constantly patching interoperability issues.
Repetitive Tasks Automation
The post-modern data stack dramatically changes the day-to-day work of data engineers. Research has shown that over 85% of data professionals [DataAware Pulse Surveys 2020, 2021, 2022] are overwhelmed by time-consuming, manual tasks and are actively seeking automation to free up their schedules. They yearn for a world where they no longer need to worry about the mundane details of pipeline orchestration, change management, and impact analysis, and can engage with business users to deliver actual value-add work.
At the heart of the post-modern data stack is a new type of smart mechanism that actively manages change on behalf of the developer, in real time. By contrast, in the fractured modern data stack, any modification in the code or in the data necessitates a thorough impact analysis, where data engineers methodically trace the potential implications of the change throughout the entire network of pipelines.
In the post-modern data stack, this complex but highly patterned process is entirely automated. It’s akin to a skilled detective, following the evidence in the meticulous always-on metadata traces in the platform to run down and perpetuate the logical ripple effects of change into every corner of your enterprise dataflows, so you don’t have to.
The benefits of this approach are manyfold:
- First, it dramatically reduces the risk of human error. With built-in automation handling the intricate details of pipeline orchestration and change management, there’s far less room for mistakes, especially the kind that comes from fatigue associated with rote tasks.
- Second, it dramatically boosts efficiency and productivity. The automation of the post-modern data stack frees up data teams’ time. Freed from the burden of routine tasks, the entire team can shift their focus away from maintenance and concentrate on strategic, high-value tasks.
- Third, it enhances scalability. As data volumes grow, the post-modern data stack natively and effortlessly scales up operations, thanks to its inherently scalable architecture and automation down to the lowest infrastructure level. You no longer need to worry about running up against obscure limitations in the inner workings of tools and integrations that can be caused by increasing workloads and new use case scenarios – the system takes care of them.
What is the business value?
Stepping into the realm of business value, how do the converged capabilities and deep automation of the post-modern data stack translate into tangible advantages for your organization?
Productivity Improvement
Evolving an organization into the post-modern data stack is like replacing a leaky old VW bug with a modern VW Golf R: designed for barely road-legal speed, maintenance-free efficiency, and unmatched reliability.
In the modern data stack, not only do engineers find themselves wasting time in the tangled web of manual and repetitive tasks. More critically, it saps their creative potential, dramatically reducing overall productivity and innovation. The post-modern data stack flips the script on this scenario. The automation frees engineers to focus on strategic, high-impact projects and business engagement. The result? An increase in productivity, innovation, and overall job satisfaction for all team members.
The effect is far-reaching. For example, in a traditional setup, a data scientist working on developing a new predictive model that will impact revenue spends a significant portion of her time building, managing, and troubleshooting the data pipelines that feed her model (research shows this can take up to 80% of her time). However, in the post-modern data stack, these pipeline-related tasks are automated. As a result, she can devote far more time to tuning and perfecting the predictive model, bringing it to production quicker, and impacting the bottom line faster.
Change Enablement
In the world of data, change is as inevitable as the rising sun. The landscape of data schemas, business logic, data sources, and analytics requirements is constantly evolving. The data stack in which the pipelines are built, operated, and maintained must be extremely nimble to constantly adapt. The post-modern data stack is not just equipped to deal with extreme levels of constant change, but is designed to embrace it.
The post-modern data stack is like a well-trained assistant who proactively adjusts the data pipelines to evolving technical changes on the fly, and adapts to adjustments in business needs without needing constant supervision. This frees up the data team from permanently reacting to change and firefighting unwelcome side effects, and shifts their entire operating posture to proactively welcome change and concentrate on adding new value to the business.
For instance, consider the daily occurrence of adding a new data source to a pipeline, or making a change in the processing logic of an existing one. In a traditional setup, this might require substantial time and effort for impact analysis, debugging, carefully reengineering, and manually testing the pipelines. You then have to shut down the running pipelines, promote the change into the production data stack, and relaunch it to reprocess the existing data sets. After that, you need to manually clean up the redundant data sets, sorting between what changed and what didn’t, until everything is resolved and you can give the business the all-clear to resume using the data for their reports.
But with the post-modern data stack, these tasks are handled automatically and on the fly, while the single platform visualizes as changes propagate in real time, and handle all the cleanup behind the scenes. Backed with this level of automation and assurance, the engineers can shift their diligence to focus on how the new data adds value, and derive valuable insights from it.
Improved Velocity
If the post-modern data stack is the modern Golf R of data engineering, the modern data stack is the leaky old VW Bug. It may have gotten you from point A to point B, but it was slow, cumbersome, and unreliable. You needed a bag of wrenches in the trunk to handle every change, and every bump in the road had ripple effects, causing delays and disruptions across multiple downstream pipelines.
Since the post-modern data stack is a single, integrated platform inherently built to automatically propagate changes, disruptions are minimized and execution of data projects is dramatically accelerated. The resulting shift in operational posture means data engineers can confidently build and extend existing pipelines, and amplify their value to new business use cases even while increasing their speed to deliver. With the deep end-to-end monitoring and observability of pipelines in the post-modern data stack, troubleshooting becomes a breeze. When any type of break occurs such as unexpected data formats or disruptions in cloud infrastructure, it instantly pinpoints the root cause, shortening time-to-resolution.
At the business level, it drives faster decision-making, quicker insights, and rapid innovation. By evolving to the post-modern data stack, you’re not just improving velocity; you’re setting a new speed record in the data engineering race.
Reduced Costs
In the modern data stack, quantifying where resources are consumed for which part of the pipeline is nearly impossible, making optimization unattainable. On the other hand, the unprecedented end-to-end operational visibility in the post-modern data stack is a game changer. By knowing exactly where and when in the infrastructure each job was executed, and closely metering the compute resources required to complete it, you can quickly gain a granular understanding of resource usage patterns across your pipelines.
Armed with this knowledge, engineers can critically evaluate the value of each pipeline against the cost to run it, and choose where to invest more effort to simplify, remove, or aggregate the pipeline logic to save money without compromising value. On a larger scale, this transparency extends to the executive level where VPs and CxOs gain a deeper understanding of where their money is going, understand how the pipelines are contributing to the bottom line, and can direct resources to decrease costs and increase business value.
Harness the Value of Data Pipeline Automation
The post-modern data stack ushers in a new era of data engineering, replacing the disjointed tools of the modern data stack with a simplified architecture centered on a unified automation platform. By underpinning data pipeline with this transformative level of automation, data engineers and the entire data team can shift their operational posture to focus on delivering strategic business value, rather than living in an all-consuming world of manual, repetitive tasks.
This shift enhances productivity, enables change, improves velocity, and reduces costs, transforming data engineering from a cost center into a strategic asset. The post-modern data stack is more than just a technological solution—it’s a game-changer for how business is done.
As we navigate this new era, it’s crucial to understand the centrality of data pipeline automation as the engine driving the post-modern data stack. Embracing this technology means investing in the future of efficient, innovative, and business-focused data engineering.
Additional Reading and Resources