Data Mesh: Moving from Concept to Reality

Back

Data Mesh: Moving from Concept to Reality

Watch this insightful webinar featuring Simon Smith, CDO at News Corp., as he navigates the practical implementation of data mesh.

Tom

Tom Weeks

tom@ascend.io

This is the transcript for this seminal conversation of how a global media company is evolving its data strategy toward a working data mesh. The webinar was held in March 2022, as a conversation between Simon Smith, Chief Data Officer at News Corp. and Tom Weeks, COO at Ascend. Simon's unique perspective in a data intensive business at global scale includes a realistic approach for how to develop a data mesh in the middle of constant change, how to shape a pragmatic roadmap that constantly delivers value, and how to thrive as a result.

Tom

So without further ado, joining us today is Simon Smith, Chief Data Officer at News Corp. Simon's illustrious career covers leading companies such as Accenture, Deloitte, Google, and for the last 11 years in various data leadership roles at News Corp. He is two years into implementing a major data transformation across the company. And he's here to share his experience. Thank you so much, Simon. And just to kick things off, why don't you give folks a little bit of background on yourself?

Simon

Thank you Tom! I've been in News Corp coming on 11 years now as Chief Data Officer. At college, I studied physics, and I was always into the numbers and the maths and the models. And then like many people in the late 90s, I got an IT job, but always missed the maths. Then 10-12 years ago data science and big data came along that combined the tech and the engineering with the numbers. I love the confluence of these disciplines, the experimentation with data, the testing and learning, the storytelling. Now it's so much more than just the tech.

Tom

Let's kick this off with some context of the News Corp. business structure in which you operate.

Simon

Well, News Corp. is a large global media information services company. We have operations all over the globe, particularly in the US, the UK where I'm based, and Australia. Our company is essentially nine business units, which are really just largely independent companies. We have some of the most recognizable media brands in the world, including Dow Jones, the publisher of The Wall Street Journal, and B2B products, most notably Facteva, which is the largest corpus of licensed news information anywhere in the world. Our businesses in the UK include the Times of London and The Sun.

We also have various radio stations and other assets, like the book publisher HarperCollins, and realtor.com in the US. In Australia, we operate a wide variety of news publications, TV, and other assets. So it's really a lot of different parts of the business. We always talk about News Corp. being a company that is truly greater than the sum of its parts. That's even more true about data, and it's my job as CDO to tell that story through the data.

For the most part, the business units are self-sufficient from a technology, infrastructure operations, and even data perspective. So, globally speaking, we operate multi-cloud architectures in the extreme, with some people on AWS, some on GCP, and some on Azure. Probably one flavor of almost everything you can imagine! In essence, my job is to connect the dots in the center of that massively diversified ecosystem of organizations. It's really very aligned with what data mesh is all about. I'm not a mesh expert, but it's really well aligned with some of the principles of how to build out a picture of the center from these autonomous units at the edge.

We want to tell the story at the center, and grow our benchmarking across the group, to find what's working and what's not working. In the early days, there was a lot of focus on growing our subscription businesses through digital engagement, and by advertising, which is also very digital. That's been a big focus for a number of years. Bringing all of those numbers together and telling that story became more and more important. Initially, that was done ad hoc, reporting from the edge into the center, with shared spreadsheets and such. But as the ecosystem became more complex, we needed to slice and dice by different dimensions, which becomes difficult to scale with manual processes like that.

Tom

This is one of the aspects that has always fascinated me about your specific situation, in that you already have capabilities out at the edge. In effect, you can think about them as domains in the mesh terminology, or even mega domains, because these encapsulated business units have their own domains within them. None of these were plugged into any common anything, so you as a central domain had access to nothing except the spreadsheets. I can see how that was really at the root of this initiative.

Diagram to represent data mesh organizational architecture simplified

Simon

This comes back to storytelling. The first thing we had to do was find a way to automate that distribution of information that was previously manual. We started with the collection of a set of standard metrics, standard KPIs, at the center. This sounds easy because we're not talking about petabytes of data, there's no big data problem there. But the scale and complexity of this task grew very quickly. For example, each of the nine business units has a different system of record, or at least they've configured their system of record differently.

Then you add on the line of business as another dimension, and the kind of things you want to tell stories about as another dimension. You quickly end up with tens and tens of thousands of individual pieces of information coming out of systems that you don't control.

Starting with digital analytics, everyone has a different analytics tool, or at least they've configured it differently. Then there are customer systems, billing systems, content management systems, ad servers, and other ad systems lying around the ad business, not to mention back office functions, finance, HR, etc. These systems change, and since we don't own or control these systems, we don't really get notified. So the number of interfaces and the sophistication of the business rules that we needed to create to manage all of this complexity at the center quickly becomes overwhelming.

On top of that, everything just breaks at different times, for different reasons. People underestimate the complexity of that task because it's not a classic big data problem. We spent years trying to solve this problem, and it led us to invest heavily in orchestration tools to manage all of those different interfaces. You just can't do it manually, or by writing heavy procedural code. So automation has been a big part of making that possible.

Of course, it doesn't stop with KPIs. That was Phase One, just to equip us to be able to do basic trend analysis and tell the story on demand, rather than doing it ad hoc. Unlocking deeper analysis, using AI or machine learning, or creating any sort of data products on top of more fine-grained data, requires much more granular access. Take, for example, clickstream browsing behavior in the digital context, and ad impressions and clicks in an ads context.

So in Phase Two you take the things you've learned in Phase One about 1000s of interfaces, and apply that same orchestration complexity to huge volumes of data. Then in Phase Three, you add business rules to handle data that you have no control over. It becomes very inefficient, but not disruptive because a lot of required plumbing can sit at the center. After that, you can look at greenfield opportunities involving collecting data once and sharing it back out to others. Those are harder to achieve organizationally, but simpler to achieve technically.

Tom

That's a great strategy for how to execute a program, in this case, data mesh implementation, with some relatively quick successes, and get traction. You are also describing data products in the context of a maturity model for data mesh, not just a current state and an end state. Your stages are practical, achievable, and can show everybody progress along the way. You've defined some clear boundaries between them and what they mean, and what those interfaces and APIs really need to look like.

Simon

It's easy for engineers to get attracted to Phase Two and Phase Three big data stages. So many times in my career I've started big data projects that have taken up time, money, resource, and cognitive bandwidth, and drawn away from the essence of a Chief Data Officer's job, which is to tell the story and to guide the business. As simple as Phase One KPIs sound, quite often they can have a big impact on the organization.

Just being able to measure simple trends across simple dimensions in a reliable, repeatable, consistent way, is incredibly powerful. Organizationally speaking, being able to deliver those wins early gets some runs on the board and builds confidence with your stakeholders, which then unlocks bigger investments down the line. A lot of the nuance and the judgment is about knowing when to get the heavy guns out and when to actually just do something that works.

Tom

Practical. Awesome. So you described what the sides of the box look like at News Corp, your approach, your maturity model of how you wanted to move along, and a data product plan to achieve your objectives. So 18 - 24 months ago, where did you first run across data mesh, or mesh-like concepts? And how did you leverage it in your storytelling?

Simon

When we first started hearing about some of these architectures, data mesh and data fabric, we were thinking about how to move beyond KPI reporting and start onboarding some of these larger datasets. They are distributed globally and on different cloud platforms, so we didn't have the luxury of just moving everything into one central location. Some people say you don't have to move any data, but actually you do, and we'll come to that in a minute.

As often happens with these new technology trends, there are a lot of really nice marchitectures out there that tell a really compelling story. And vendors will tell you how they can teleport your data across time and space to turn distributed data into connected data, just like that, without moving any bytes anywhere. When the rubber hits the road it's not really like that.

I happened to read an article that left me with a light bulb moment, where I understood why some of these principles are really important. With all of these new technologies, it often comes down to principles more than anything else. Three principles struck me, and we now reflect them in our architecture. One is interoperability. The second one is flexibility. And the third one is control. I'll get into what I mean by control because that's a little bit of a strange word, and sometimes perceived as a bad thing.

By interoperability I mean the true ability to interface to any data system in a federated architecture. I've received lots of pitches, particularly from database vendors who are jumping on the mesh bandwagon, to sell me on architectures that do distributed querying across physically separated datasets, without ever having to move any of the data to the center. But the quid pro quo is that every database has to be the same flavor. To orchestrate massively complex queries at the edge across nine different databases requires the same database to be installed in those nine locations.

For a lot of organizations, that's not a realistic proposition. Not only do we have a hybrid cloud strategy, but our teams have also invested in their own architecture and have done their own migrations. You can't turn that on its head and say, for example, you can't use BigQuery anymore you have to use Snowflake, or you can't use Redshift anymore you have to use BigQuery. That just won't fly in a complex organization. So interoperability has to mean true interoperability, which is "talk to anything anywhere."

News Corp.'s three principles of data mesh

Regarding flexibility, this leads us to a single architecture, or a single platform, that is capable of operating in different modes. Thinking about Phase One, Two, and Three, you can't have one stack to do KPIs, and a separate stack to do big data. That would mean two worlds that evolve separately, two organizational teams that are skilled in different technologies, and two different operational fabrics, like two different schedulers, etc.

What is needed is an architecture that's flexible, that allows for both in the same box. It needs to be able to mix and match because in some cases those worlds come together. For example, you need to be able to build KPIs from big data. In a merged API and KPI world, you still get weird spreadsheet-type data, while also doing some data gymnastics with pandas. You run big data on Spark or BigQuery, but be able to orchestrate everything under one umbrella.

And you need the flexibility to pull information to the center, or push information to the edges. So what's needed is an architecture that has a sufficiently broad range of utilities that enable any kind of computational model. That's hard because it means that the framework has to be pretty light, but it has to give you the guardrails to keep the consistency.

Then there is control, a broad and maybe slightly ambiguous term. This means the ability to see all data movement and all data access that's happening across the enterprise, on a single pane of glass. It spans all three phases, with the ability to see what we're getting from each place, and what we're pushing back.

Control and track who is accessing what data, and what data is being used for which downstream data products. This is not just operationally important, but from a hygiene perspective, we live in an increasingly regulated world, particularly as a large consumer-facing organization. It's not tenable these days for there to be backdoors or exceptions. Everything must be tracked, access control management must be spot on.

Coming back to mesh theory, I would sacrifice a bit of efficiency and incur some data duplication if I can guarantee these principles of access are respected. Databases that provide row-level and column-level ACLs will police these things for you, but with distributed access settings for data pipelines and for databases, it can quickly become unwieldy to track and maintain.

For example, how do you test whether access controls are working? A tester may be able to log in, but they can't audit what someone else can see, because they have different permissions. These sorts of things become very difficult to test. That keeps me awake at night. I would rather err on the side of having separate copies of data that can be locked down for individual people. I would lock down all the ultra-sensitive data, so no one has access to it period.

I prefer providing physically separate sets of redacted or summarized data, where access becomes easy to test. A lot of architectures reflect good principles, but require flexing for what actually works in practice. I'm not a mesh expert, and a lot of it is very theoretical with a lot of activity in the vendor space around it. But those three principles have really worked for me.

Tom

Well, you had some initial stimulus from popular topics in recent years, which now align quite well with data mesh. Interoperability and flexibility are two of the core characteristics that, in formal data mesh lingo, drive the utility plane, the product experience plane, and the mesh experience plane. Control as you're defining it really sits squarely in the mesh experience plane, with discoverability and access. One important nuance you describe is about visibility into the actual movement of data, to actually see what is happening to datasets along the way as data moves through the mesh.

Once you developed your vision for how to drive change, how did you sell this? Aren't big ambitious projects becoming less and less attractive? How did you approach this?

Simon

There is not much challenge in selling the concept of creating a more rounded data asset at the center. Nobody in News Corp. will argue that data is not something you should invest in, and no one will argue that a global view of data is not important. The nuance was trying to land on an architecture and approach that was going to be deliverable.

As we were going through some of the options, we had conversations with organizations on whether to standardize globally on a particular sort of database technology, or a particular sort of cloud technology, and build out a truly single global platform. It's actually quite attractive, and in another organization, it could be the most efficient way of doing it. But in a significantly complex organization where people have different investment horizons and different investment roadmaps, just moving everyone to one system is incredibly difficult, if not impossible.

So mesh came along at the right time and provided some conceptual guardrails, and changed our perspective. We realized we did not have a data storage problem, we had an orchestration problem. If you approach this from a data storage lens, it all becomes about fighting the inherent entropy of data in our organization. But if you put on the orchestration lens, you can acknowledge the entropy of data, understand what it is, and think about how to accommodate it.

That's what got us looking at orchestration tools, which is what led us to Ascend. Finding a tool that gave us that ability to plug into anything, and deal with almost any kind of business logic, came along at exactly the right time. At that point, it was a very easy sell.

Tom

So at this point, you weren't really thinking about a mesh framework, like a utility plane, data product experience plane, mesh experience plane, etc. The way that you were thinking about it applied to any type of data automation technology out there.

Let's take a moment to map the state of News Corp. to the three mesh planes. In the utility plane you would have a little bit of everything, multiple clouds, multiple warehouses, multiple databases, each of the domains making their own decisions, creating different combinations of technologies. They can carry on, there is no heavy lift to move somebody from one type of infrastructure to another. That part of your situation wasn't going to change.

Meanwhile, the data product experience plane holds your elevated conversation about orchestration. Something like Ascend creates a single experience, while keeping full domain optionality for underlying systems in the utility plane. It leverages data automation, you can see the artifacts, you can inspect them and observe them running, you have a catalog, and you can see what data is moved. Regardless of whether data is stored in or moving through BQ, Snowflake, or a Spark platform, the teams get everything you were looking for in terms of orchestration.

In the mesh experience plane, you can address some further ideas on what you want to do there. You have the option to layer on broader, more sophisticated capabilities.

So at this point, you have the strategy, you have a technology platform that you think is going to help you do this while enabling each of the domains to keep their optionality. So where do you go from there? How did you get some quick wins, and show progress?

Simon

It takes judgment to understand how and where to scale. I started with three of us working in that KPI Phase One. Now we have many more people on the team. The smart technology choices that we made just allowed us to start connecting to and onboarding data, including external datasets, and developing business logic right away. We went straight to delivering business value, and didn't have to spend a lot of time developing frameworks or complex orchestration, or operational boilerplate. Our focus was laser. We were in a position to quickly assemble a standard set of operational KPIs across the entire business reliably every day, every week, every month, and it just works.

Because of some of the tech choices we made, this wasn't an intensely technical undertaking, and we fostered a really nice hybrid team. The most important nuance was actually having people who understood the data. We over-indexed on analysts right from the beginning, not on people who write Python or any other code, but on people who understand what to measure. Like referrals from Google News, they know exactly how to get it, and have the judgment to say which numbers look weird and which look right. If you don't have that, you're just writing a lot of code with no checkpoint to understand whether the numbers make any business sense.

Once we operationalized the KPI metric phase, we added some data engineering, looking for opportunities to go further down our maturity roadmap. When you start pulling in more granular data, things become complex in a large organization, because different operating units are at different levels of sophistication. Some people have been investing in big data capabilities for a long time, so they're more ready, and we can move forward with them. We consider whether we really want to spend the time to pull in granular data if we don't have confidence in its accuracy, which depends on the upstream organization. We look across the organization to find which areas are ready to go, and which are blocked.

This can be because the upstream capability is not mature enough yet, or in a lot of cases, organizationally our interface with that team is not as mature and as sophisticated as it needs to be. To be able to accomplish goals together, we need tight relationships across nine different business units on different continents around the world. Particularly in Phase Three, where you get into high-complexity domains, having a reasonable understanding of how you can make progress with whom is more important than anything else.

The next smart thing you can do is look for greenfield opportunities. In our organization, there are a number of domains that haven't had a lot of traction in building out data assets. When you draw a map with them, you end up finding gaps across the board. That is actually good because it allows us to move to Phase Three more quickly. Instead of having those teams build similar capabilities separately, we can have a conversation with them about what is good for the business, build it once in the center, and push it out. It takes solid understanding of the complexity of the organization to choose who is ready to push now, and where you need to accomplish other things before you can raise the sophistication level.

News Corp.'s matrix for evolution of data management.

Tom

Mesh is understandably silent on the downsides of pushing things out to your domains. There are often haves and have-nots in every organization, various levels of commitment, skills, and funding. Central leaders need adaptive strategies for how to elevate the entire company. If the focus is to push things out to the edge as fast as possible, often the rich get richer, and the poor get poorer. Instead, in your approach the smaller, underserved, less capable groups can create value in a relatively short period of time, which raises the capabilities of the entire organization more quickly.

Diagram to represent data mesh organizational architecture with domain importance.

So if one of your CDO peers called you up tomorrow and said they are undertaking a major initiative similar to yours, what advice would you give them?

Simon

There are four things to get started.

One, show real progress quickly, even if that means just basic analytics. More powerful than anything else is getting real results.

Two, be clear on how more sophistication requires more investment, and how that unlocks new use cases. For example, a lot of business execs really want a dashboard, but don’t understand why it takes a multimillion-dollar big data program to build it. Arguably, it often doesn’t, but as a Chief Data Officer, you know it’s really important to build out a capability, not just a quick hack. That requires making a clear case. Often the use case is advanced analytics or machine learning, but they all need to tell that story. I’ve seen numerous big data projects for a dashboard that could have been done five times quicker and five times cheaper. The case has to tie sophistication to business value.

Three, be pragmatic from an architectural perspective. Mesh can be quite prescriptive when you read it, but ultimately, this stuff has to work. Like we said earlier, maybe you do end up with some data duplication, because it’s the right way of managing access and security. As long as you have a rational basis for those decisions.

Four, in a complex organization, don’t treat everything as a storage-first program. Understand quickly whether or not it is a storage problem, and if it’s not, focus on orchestration and automation. That’s how you extend control across the estate, across clouds, and across different systems at scale.

Data mesh implementation recommendations.

Tom

Awesome. So what would you say was your biggest challenge going from Phase Zero to Phase One, the KPI metrics collection?

Simon

It really comes down to understanding the upstream data systems. I have one particularly good data analyst on my team who just knows how this stuff worked. When you start showing numbers, globally, you better make sure they’re right. Because if they’re not, you get questions back, you get noise, but more than anything, it starts to undermine people’s confidence in what you’re doing. In our case, if we’re going to be publishing numbers about the health of our constituent businesses, it’s got to be right.

You need people on the team who know what correct looks like. That might not be the Spark engineer, instead, it might be the guy working with Adobe Analytics. If you don’t realize that you’re in danger of never nailing it. But as soon as the first one is correct, you can hit a stride of publishing correct data, every day, every month. As confidence grows, more people get more interested in wanting to see the data. More people ask to get access, and then it goes from there.

Tom

How invested are you in assorted data observability technologies and approaches?

Simon

It’s one of our biggest focuses for this year. As CDO you’re setting yourself up as a high-visibility nexus for information in and about the company. Yet you’re dependent on data from systems that you don’t control. So it’s vitally important that you know when stuff goes wrong before your customers find out that stuff went wrong. For me, that is the essence of observability. For example, we get a lot of raw telemetry from Ascend itself as our orchestrator, for which we need intelligent alerting. We usually know that something went wrong, but sometimes the message doesn’t quite get to the right person in a timely manner.

Tom

The data is all there, but additional instrumentation could help identify what is important, and initiate the process to do something about it in a timely manner.

Simon

This is hard. Alerting is built in, and we monitor stats and send alerts, but unless those alerts are intelligently written, the team can become blind to them. When there are many unimportant messages a day, you start to ignore them. This incremental sort of sophistication notifies us when stuff is really wrong and requires attention, rather than just sending rote messages.

To wrap this up with some context, at News Corp., each online business unit has fully staffed tech and data teams. Business units have from a dozen to hundreds of people working in data, so globally we have a community of many hundreds of practitioners in data.

My team of 20 or so at the center consolidates data across the organization, a mix of engineers, analysts, data science, data product, and data governance people. In the context of the size of News Corp., this is a very small team for a big effort, which relies on smart technology choices. It also relies on a strategy that enables us to unlock business value without having hundreds of people from the start. That said, we are growing, increasing sophistication, and unlocking more investment that will allow us to do more for the business. It’s a step function, not a big bang.

Additional Reading and Resources

‍

Data Mesh: Moving from Concept to Reality

Tom

Simon

Tom

Simon

Tom

Simon

Tom

Simon

Tom

Simon

Tom

Simon

Tom

Simon

Tom

Simon

Tom

Simon

Tom

Simon

Tom

Simon

Try it out. Your future self will thank you :)