Parsing current data stacks is like playing the longest-lasting game of Jenga.
Datasets and code are centralized into one big monolithic architecture.
Only the most experienced data engineers can extract pieces of it. And every time they do, they risk toppling the whole infrastructure over.
But there’s a better way to play the game: data mesh.
Data mesh decentralizes and democratizes data across the organization. By splitting data into highly standardized and loosely coupled domains, data engineers can work with business stakeholders to build the data products they want and need in a highly shareable, accessible, and useful way.
This article highlights data mesh as an alternative to the status quo with a:
- Brief explanation of what data mesh is
- An overview of why data mesh is important
- Deep dive into the six major benefits of using a data mesh
What Is a Data Mesh?
Before we explore the many benefits of a data mesh, let’s quickly review what it is.
- Data mesh, first defined by Zhamak Dehghani in 2019, is a data architecture that organizes company data into defined domains (borrowing Eric Evans’ theory of domain-driven design)
- The domains correspond to particular business areas, and the data within it is standardized based on input from business subject matter experts (SMEs).
- Data mesh domains enable data engineers to create data products — such as valuable models, analytics, and customer-facing solutions.
- Data mesh makes data products easily shareable and searchable across the organization, empowering business stakeholders to find and use the data products they need.
In this Twitter thread by Modern Data Stack, you can explore a concise yet informative overview of the concept. For a deeper dive into data mesh, check out What Is a Data Mesh? — And Why You Might Consider Building One and Data Mesh vs. Data Fabric: Which One Is Right for You?.
Why Is Data Mesh Important?
Before we dive into the specific benefits of data mesh, let’s establish why data mesh is important. Data mesh offers a more collaborative, organization-focused approach to data, which promotes data usage throughout the organization.
Since experts vet each piece of a data product’s puzzle, stakeholders can feel more confident using data products for decision-making.
And because each data product in a data mesh is made widely available, end users can combine the products they’re currently using with other products to get a more holistic picture of how their business unit is running.
This paradigm for domain and pipeline ownership and stewardship allows for:
- Greater velocity: Building new data products is as simple as connecting all the data product building blocks. With this architecture, data engineers can deliver on promises to stakeholders quicker, shrinking, or even getting all the way through, their backlogs.
- Domain use and reuse: When data consumers and customers are involved in the data product development process, they are more likely to use them. And because they can see what other data products are available, they can build more sophisticated new data products that move the business forward.
- Improved efficiency: The one or two hyper-specialized people who knew your old data architecture inside and out aren’t needed to design high-quality, effective data products. That means less time in meetings, fewer bottlenecks, and more time doing useful work.
Data meshes offer a remedy for addressing the limitations commonly associated with monolithic data architectures. But based upon the specifics of each organization, data mesh concepts can become important for different reasons. We had a conversation with Simon Smith, CDO at NewsCorp. to understand the practical impact a data mesh can have at a company. You can watch the full conversation below.
6 Key Benefits of Data Mesh
We’ve touched on the implications of a data mesh on an organizational level, but now let’s get down to the technical details. The six concrete benefits of data mesh are:
Data mesh promotes scalability by breaking down data into smaller domains that can be standardized and reused across the organization. Creating independent data products allows data engineers to easily connect these building blocks to build more complex solutions. This makes scaling data products more manageable and faster than the traditional monolithic architecture.
According to AWS, taking a data mesh approach has empowered data product owners at leading companies like JP Morgan Chase to make better management decisions regarding their data, promote its use, and visualize data consumption across the enterprise.
Faster Time to Value
By fostering a self-serve data infrastructure, data teams can quickly access the data they need without lengthy approval processes. This allows them to build and manage their data pipelines independently. As a result, Data Mesh can help organizations deliver new data products and services more quickly and efficiently, leading to faster insights and decision-making. This is a crucial competitive advantage in today’s rapidly changing business landscape.
Intuit’s Chief Architect, Tristan Baker, points out that their data mesh “empowers data workers to design, develop, fully describe, and actively support their own data-driven systems.” With this agency and transparency, other Intuit stakeholders can discover, understand, trust, and consume data themselves, creating a flywheel of new data products.
Improved Data Quality
Data quality is one of the biggest strengths of a data mesh. Because a group of experts dictates the data in each domain and works closely with data engineering to maintain it, data quality will not deteriorate due to neglect over time. Data mesh flips the traditional script: stakeholders closest to each domain are accountable for data stewardship, which includes data quality monitoring and management.
Zalando, a large European fashion retailer, even uses an “opt-in” methodology to reinforce the responsibility of creating quality domains — stakeholders are forced to decide whether to store a dataset in a central archive for anyone else’s use and continue supporting it over time.
Low-Complexity Change Management
Domains serve as the foundation for data products, making adjustments much easier to regulate because each domain represents a standalone data model. If a SME determines that a domain needs a new column, data engineers work to add it and, in general, do not need to consider the impact to other domains. The change gets propagated to corresponding data products (and any other “mega” products built from multiple data products based on that domain).
Decentralization may seem like a hindrance to security, but it’s actually a feature. Each domain owner can apply security rules as needed to secure their domain. Confirming the security of each domain preserves the security of every data product as well. Spending quality time testing and formalizing domains upfront makes it easier to maintain security all the way down to the end user.
Approachable Data Governance
With a data mesh, each domain manages its own data products and change process. Overall, data governance is simplified because overlapping concerns are reduced or eliminated altogether. SMEs define data keys, validity, and format and are responsible for working with data engineers to make updates as the business scales. If end users can trust the arbiters of each domain, they can trust the resulting data products. Corporate governance focuses on change notification instead of detailed change management.
The team at Starship, a company that produces autonomous delivery robots, maintains their data mesh governance “via a culture of ownership, discussion, and feedback within the team.” Limiting governance to a smaller group of people who are accountable for their own domains and care deeply about the quality of those domains makes it easier to make decisions faster.
Data Mesh Is the Future
Data mesh unwinds the intricate, delicate, complicated web of monolithic architectures to get the right data to the right people at the right time.
Stakeholders are included in the process, injecting trust, accountability, and accuracy into data engineering workflows. Publicly available data products enable enterprises to operate with higher velocity and efficiency. Finally, and perhaps most importantly, data mesh implementation ensures that the value of data is captured and used to drive the business forward.
Additional Reading and Resources