Ep 24 - 2023 Data Engineering Trends

Join Sean and Paul as they unpack the trends from Ascend’s annual DataAware Pulse Survey. Learn why executives and individual contributors disagree on strategy so often… and why many in the data team want to drive automation but struggle to achieve it. All that and a full recap of the 2023 Big Data London event in this episode!

Transcript

Paul Lacey: All right everybody, welcome back to the program. This is The DataAware podcast by Ascend, the podcast about all things related to data engineering, including modern trends and everything that you need to know to stay up to date with this rapidly changing landscape. I’m Paul Lacey, your host, and I’m joined by Sean Knapp, the founder and CEO of Ascend. Sean, welcome.

Sean Knapp: Hey, everybody.

Paul Lacey: All right, Sean, we’ve got a lot of stuff to cover today and I know these have been creeping up in length. I really do try to keep them concise because I know that folks are typically trying to slot this into a commute or something else that they’re doing. Let’s face it, we don’t commute anymore, but something else they’re doing, maybe running on the treadmill for 30 minutes exactly. So let’s just jump into it, shall we? Two topics that we have for today, Sean. So the Ascend team just came back from a really great event over in Europe called Big Data London, and this is the 2023 edition of this event. It was our first time there as a company exhibiting, seeing what’s going on over there. And so I’d love to just get some hot takes, Sean, of what you thought of the show and what stood out to you.

Sean Knapp: Yeah, absolutely. I think there was a few things that really stood out. First, I thought it was going to be big, but I didn’t expect it to be as big as it was. As we found out, they even had to open up another expo hall area and the demand clearly outpaced even their own expectations, which was amazing. And I think that signals very strongly how fast the European market is also developing and maturing and growing, as well. So that was one that I thought was really fantastic.

The second big takeaway for me was that there is tremendous interest in really this next wave of data engineering platforms that sit on top of the modern data cloud platforms. And I think perhaps even more pronounced and distinct from what we’ve seen in other conferences, I saw this huge push and huge interest around consolidation, because I talked to a number of different companies at the conference who were trying to make sense of it all.

And when you walk around on the conference floor itself, you just see so many technologies and so many consultancies and so many different vendors. And the challenge, I think, for a lot of folks in that market, very similar to what we see in our own here domestically, is how do I make sense of all of this? And in many ways, I’m just trying to go build some data pipelines. I just signed up for a really exciting commit and I’m moving forward with Snowflake or Databricks or on BigQuery and I’m just figuring out how do I help my data engineering team be highly productive and build a bunch of these new data products we want to go build? And I think it’s the recoil we’re seeing now from the just proliferation and sprawl of technologies. And that’s the second thing, I think, that really stood out to me was just how pronounced that was in this conference. What about you?

Paul Lacey: Right. Yeah, I totally agree. It felt almost like a Las Vegas-style show for data, especially data vendors, data companies, that sort of stuff, which is impressive to see. I know they’ve been building up towards this over many, many years. I believe the conference was first founded in 2016, so they’re coming up on pretty close to almost 10 years running, seven, eight years now. But yeah, just walking out on the trade show floor, and I think they opened up the upper level this time, so you could actually walk up and kind of look down on the trade show floor like a bird’s eye view, which was really cool in the venue. And it’s a super unique venue, as well. It’s very open and airy and industrial and that sort of stuff. It was an overwhelming sea of color and flashing lights and all kinds of stuff happening with your data and different companies saying different things, but slightly the same stuff, but slightly different about what you can do and that sort of thing.

So I could definitely see… and many of the conversations that I had, there were quite a few people that were just overwhelmed with this deer in the headlight look of, “How are you different from anybody else here? Because it seems like there’s so many of you and you’re all here trying to do similar things,” and that sort of stuff. So there’s definitely a hunger and a desire for people to rationalize what’s out there and understand what is the unique secret sauce of what are very mature technologies now that are out there in the market for dealing with data in the cloud, and just how do things integrate together better?

Sean Knapp: As I think about it a little bit too, a couple of things really stood out to me. And I’ve been around the space long enough where I remember watching a lot of the vendor pitches from a decade ago of just dump everything into your Data Lake and we’ll solve it all from there, and magic and rainbows and all sorts of other awesome stuff will happen afterwards.

Paul Lacey: Schema on read, right?

Sean Knapp: Exactly. And so if you asked me three years ago, I would tell you probably my biggest concern and consternation with the industry as a whole was every vendor had the same message. It didn’t matter what part of the stack you were, whether you were solving cataloging, or observability, or orchestration, or transformation. Regardless of what part of the stack, nobody said what they did. They all just said, “We’re in data,” and they told you the value proposition of mastering of data. And so every vendor looked the same regardless of whether or not they were even competitors. In fact, you would also oftentimes find companies who were actually partners with each other but would have conflicting messaging because they’d be saying the same thing. So it was very confusing for the space a few years ago. I think we’ve matured past that now. And so walking around the floor, I was really struck by two things.

First was, “Hey, this is fantastic. We’re now seeing vendors get more specific around their value proposition. Where do they fit in? What do they specifically do?” Which is great because I think that’s the first step for helping a lot of people going on this journey, especially early in their journey to understand, “Well, where does your platform, your technology fit in, and how does that help me create value and put points up on the board?” And so I think that was the first one that really stood out to me is I think we’re finally getting more specific around what everybody does. The second piece that I noticed, that I don’t think we do yet, I really hope we do as an industry at some point soon, which was most vendors and technology providers I saw don’t actually show their product. Most of them were still putting a higher level messaging on most of their screens and most of their demo stations.

And that’s interesting, because I think that’s really valuable. We certainly do it, where we talk about the latest ESG report where you get to build pipelines seven times more efficiently and faster, et cetera, et cetera, et cetera, which are all great. But as I walked around, part of what I think is still concerning about the space is not as many vendors are willing to put their money where their mouth is and actually show the product itself and what the users of the product are actually going to get and see and experience. And that, I think, is the next step towards truly creating value and specificity around, “This is where we fit into your ecosystem and where your team or where you as a developer can use us.” I think the more our industry gets into the details of where their products fit, I think the better off it actually is for every consumer of those technologies.

Paul Lacey: And with that, too, comes… you’ve got to have a sense of, I guess, confidence and trust in the users that they’re going to understand what it is that they’re looking at, too. So on the other side of that, as a marketer, I’m definitely guilty of trying to put forward some of the more positioning points as opposed to just opening up the product. But you’re totally spot on in that many of the folks that we had conversations with, as soon as they saw the UI, they said, “Oh wow, that’s super clean. I like that, and let’s get into it a little bit more.” And they started digging around in the screens and they started asking questions and things like that.

So a lot of people… we have collectively reached a maturity in the market where people get what these things are, what they do, and they understand that, “Hey, a data pipeline doesn’t look like much on the surface, but underneath the covers, there’s a tremendous amount of innovation happening and tremendous amount of work going on on behalf of what’s happening in the UI or in the code that’s being displayed.” And they’re starting to make that connection more and more. So it’s a little bit less of a, “Hey, what am I looking at? It just looks like a blank screen or it looks like a couple boxes on a screen,” to people going, “Oh yeah, no, I actually really enjoy seeing these interfaces and knowing what the user experience is like.”

Sean Knapp: Yeah, exactly. And I think finding the balance between those two is this perfect combination where we really do want people to be able to first anchor on, “Hey, where do you fit? What is the value you’re going to provide to me and my organization?” And then the, “Now show me how you’re going to actually do that,” I think that combination to me is the appropriate level of transparency and capabilities that we can actually really provide to our ecosystem.

Paul Lacey: Totally. Yeah, and I love that coming from a CEO leading the charge here, too. That’s very powerful. So we can expect to see a lot more of that, at least from Ascend in the near future. You heard it here first.

Sean Knapp: Exactly.

Paul Lacey: Yeah. There are a couple other takeaways I had, Sean, too, from the show, which was the first one was having a lot of really great conversations with people whose titles might not necessarily match with the roles that they were doing. And I’ll clarify that a little bit. We call this the show about data engineering, the podcast about data engineering. And in markets like here in the US, we have huge teams that are working on data projects and that sort of stuff. And so we typically have very well-defined specialty roles around that.

What I was finding in our conversations over there was that exists for sure, but then there’s also a lot of people who their title is just something related to BI or analytics or data in general, and their job is just to get shit done, pardon my French. But literally their role is just… it doesn’t matter what it takes, the executives need a dashboard that shows blah, blah, blah. And so they’re the ones who are getting in the weeds figuring out, “Okay, great, where do I ingest this data from and how do I ingest this data? How do I transform this data and process it? How do I build the dashboard on top of everything? And then how do I deliver that and how do I do that all by tomorrow afternoon?” And so their titles might not be data engineers, but they’re doing data engineering work, which I found quite interesting. There’s a lot more people who are very full stack, very hands-on with everything related to working with data.

Sean Knapp: Yeah, I think that’s a great observation and one I would also echo. I have a number of really impressive conversations with folks who first oftentimes we ask, “Oh, are you a data engineer? Are you building data pipelines?” And the answer to data engineer oftentimes would be, “No.” And you would get an answer to the effect of, “Oh, I’m an analytics engineer,” which I think is a little more common even domestically here than we found in London or a, “I’m a BI engineer.” And then when we ask, “Well, are you building data pipelines? What technologies do you work with?” All of a sudden you hear, “Oh…” You’re working with all of these absolutely amazing technologies that are actually building pipelines. And really, it’s this polymathic data expert who is constructing the data pipelines, is creating the semantic layers on top all the way up to the BI reports or even ML models, at times.

And I think that’s a really interesting trend because one of the things I think we also believe is over the course of time while we’ve started to hit some specialization, you’ll have data platform engineers, data ingest engineers, data pipeline engineers, analytics engineers, and we see the specialization happening in many ways. One of my beliefs is this is a byproduct of the tools. People are shaping their organizations around tools oftentimes as opposed to the value chain of data. And I think we’re actually going to find blurring of those lines over time. I hope we do because I think that opens up the ability for individuals to affect outcomes and create more value in broader parts of the lifecycle of data and allows people to actually go from source to sync, from raw data to refined data product. And if we, as an industry, can empower people to do that, that’s a pretty exciting outcome for me. And so that’s what made me really excited when I start to see some folks doing that out in Big Data London was I think there’s potential for even more of that in the road ahead.

Paul Lacey: And it’s also just a more rewarding experience for the individual builders, too. It’s so much more fun to be able to start and take something through to its logical conclusion and then take a step back and look at what you did and reap the pride of being able to accomplish something like that versus just putting the same cog in the same spot in the same machine for every single machine that comes down the assembly line and never really understanding how it sits into the broader context.

Sean Knapp: Totally. At this point, you get to actually maintain context, which I think actually makes you more effective as a data engineer. We’re kind of now into organizational theory, but for those who are listening, Paul’s heard this, my take on this both around data engineering and software engineering, which is as things mature, there’s the natural tendency of industries to push these engineering fields into these factory worker assembly line style of development where the context gets removed and the scope of responsibilities gets narrower and narrower, and as a result, it is pretty unempowering in many ways that we tend to see this natural pull for software and data.

And I’m a really big believer that hopefully, the evolution of industries empowers people with greater capabilities that allows them to actually expand their context and expand the breadth with which they can affect outcomes. And I think that’s one of the things I really hope to see with the data ecosystem is with all these amazing new capabilities that we as an industry, and especially Ascend as a platform provider, can create, do we enable people to actually go broader and impact data further on in the life cycle than they ordinarily would’ve been able to do? I think that’s really exciting future state.

Paul Lacey: Totally. And there’s a convergence of a couple different things that we could pick up on that when we get into some more of the data that we have later, for sure. And then I guess before we close out this topic, Sean, the last thing that really jumped out at me about the show, and this is something that if you think about it for maybe, I don’t know, gosh, 30 seconds or more, you’d probably say, “Oh yeah, it makes total sense.” But it’s one of those things that you’re constantly reminded of that we take for granted over here is in Europe, there’s just a lot more countries in the same amount of space as there are here in the US. And so data sovereignty requirements over there are so much more tricky to navigate, data security requirements, things like that.

Here, we don’t care if your data’s being processed on this server in Virginia and you’re sat here in California or somewhere else in the entire nation. And so SaaS services tend to be the platform or the delivery method du jour, but over there it’s over in Europe, it’s much more localized and specialized, and folks really have a hunger for things that are as simple and as intuitive to use as many of the SaaS platforms that we take for granted here in North America, but need to be deployed in much different ways than simply, I” just log in and access a server and put my credentials in. I can go…”

I mean, there are some things that people do. Like Ascend, we made an announcement last week that we launched a service, a version of our cloud service in mainland Europe. So those in Germany and regions that are compatible with processing data in Germany can access. But even then, people over there are still very interested in our Kubernetes containerized deployments that we can do in private VPC environments and whatnot to guarantee that their data is not being processed in the wrong place or not ever being persisted in the wrong place or touched in the wrong place and things like that. And so it reminded me once again of how much we, as data engineers that are primarily based here in North America, take for granted the ability to just log into a SaaS product and start using it.

Sean Knapp: Yeah, absolutely. And I think that was a really big move for us to expand our cloud into Europe, and one that we’re really excited about for that very reason. For a long time, we’ve been able to run our fully dedicated model for larger customers inside of Europe, but also as we see with this larger push to adopt cloud offerings from companies based out of Europe, there’s a strong desire to use our cloud offering, not have a full mass scale enterprise deployment that’s dedicated. And I think this was a pretty critical step for us in making sure that we can support more of the growing and middle part of the market based out of Europe.

Paul Lacey: Definitely. Yeah. Well that was great, Sean. Yeah, thank you for unpacking that with me and it was a great show. Ascend was looking forward to participating in next year’s show as well, and watching as we grow. And I think the major conclusion is, aside from some core differences, everybody’s got data pipeline needs and they need to move faster with those data pipelines, make sure they’re more automated, all that kind of stuff. And that’s a very universal thing that we see regardless of geographic region or company, industry, all these kind of things. It all blurs together, which takes us to the other conversation that I wanted to have, Sean, and unpack is, so Ascend runs an annual survey, it’s called the DataAware Pulse survey. And if you’re detecting the theme around the branding of DataAware, you’re not wrong.

We spend a lot of time here at Ascend building products that are DataAware, building teams that are DataAware… we can really expand on this topic as far as we need it to go. And so what we think about when we think about this survey is trying to take the pulse of the market of what’s happening out there, specifically data engineering teams. And so we look at several different people inside of those teams. We pull obviously data engineers directly, we pull data analysts and enterprise architects as part of this survey. We try to field it to a widest degree possible in terms of the company sizes that we’re getting responses from and in terms of the levels of people in those organizations that we’re pulling.

So we go all the way from individual contributors up to VPs and execs that are in charge of managing data engineers and analytics teams and pull these insights out to share with the market in terms of where are we and how much progress have we made year-on-year? How much progress are we making as an industry towards some of our objectives? So yeah, what can you say about… I guess what are your thoughts on why we do this survey, Sean, maybe first and foremost? And then we can get into some of the results of what we found this year.

Sean Knapp: Yeah, I think the key driver behind this for Ascend was not only to educate ourselves, but to actually help educate the broader market around what are we truly seeing in the data engineering landscape? And the reason why I think this is so important is even the notion of what is data engineering gets very blurry, just as we talked about before. There’s many personas that are doing actual data engineering work, even if their titles aren’t data engineering. So many people are building data pipelines, so many people are trying to automate those data pipelines. And I think part of the challenge is oftentimes if you’re early on in this journey or even midway through, there’s not a lot of great education as to, what are people really doing? What are the challenges they’re facing?

It’s like the social media effect that people see where really everybody’s struggling with their data engineering team’s capacity and how fast they can move and everybody wants to build more, but when… maybe you don’t read about it on Facebook when you’re reading on the LinkedIn posts or you’re reading the Medium articles, what it starts to feel like and what you see is everybody’s doing awesome and they have it super doubted and they’re just killing it with all these amazing new technologies.

And so you see all these people writing about stuff that’s supposed to be awesome, but in reality, what we’re actually seeing is, “Look, everybody’s just grinding their way through the shit of it and they’re all just trying to make it work.” And “Hey, they’re exploring with a bunch of new technologies, they’re building some pipelines and they’re being saddled with a ton of maintenance and support burden and everybody’s just struggling with it,” which I actually think is helpful. I think it’s really helpful for the industry to realize you’re probably not as far behind as you actually think you are, and the end state is actually not as awesome as you probably believe it is based off the Medium posts. And the reality is that we just got work to do and we got to get the work done in the industry and everybody else is probably a lot closer to you on that journey than you think.

And it was really helpful to start to get that input signal for folks and to share the insights with the industry. And we do this obviously as a blind survey through an independent third party so that it is not biased by Ascend customers who obviously have already drank the Kool-Aid around automation, are already reaping a lot of the benefits. So that wouldn’t be fair. So we do it completely blind. The people who even fill out the survey don’t even know that it’s us asking these questions. And I think that’s really helpful so that we get a completely neutral analysis of where the industry and the market is.

Paul Lacey: We’ll have to change the subhead of the report to be the anti-Medium collective view of the data engineering world. I like that. It’s so true. People on social media and otherwise, we’re trained to sell the sizzle. And so we typically post on the things that are extraordinary that are being done or that we have ideas on how to do. And we don’t really talk too much about the mundane drudgery of the day-to-day, but it’s just as helpful. And to look at some of those results, I think we saw that data engineering teams are at capacity, they’re continuing to keep on keeping on, but we did see that 95% of the folks that responded this year said they are at or above their work capacity for the second year running. I mean, it’s an identical number last year as it was this year. So it’s not like, “Hey, they always say they’re over.” It’s like, “No, last year was 95%. This year it’s 95%.” That’s a lot of folks, Sean, that are at the red line, right?

Sean Knapp: Yeah, exactly. And I think at some point, maybe we’ll get enough time and interest in it. We should just go survey other teams, too. Maybe it’s everybody just thinks they’re at the red line all the time, but I don’t actually believe that’s probably the case. I think most data teams are stuck in these early days of the maturation of their own technologies and their platforms and their investments while working against the backdrop of tremendous pressure from the business to keep producing and creating more. And I think part of why we have these two opposing forces right now, one of the forces on folks is actually helpful. We’re seeing more investment in automation and technologies and the technology landscape is maturing and improving. So in theory, should be helping teams. So that should be a great tailwind behind these teams and should actually be dropping that number.

And we’ve seen tiny drops in the first two years. We saw a couple of small drops, but I think the headwinds are pretty substantial right now. The headwinds are one, demand for more than ever before. Now, I think those have dropped off a bit in the last year mostly because organizations have shifted the focus to, “Hey, maybe we shouldn’t create so many data products and maybe we should really optimize the ones we have, at least just for a little while, just stop burning so much money as we’ve been, but early.” But I think then the second major headwind for teams, their headcounts are getting, if not reduced, the headcount growth is getting tamped down on. And so that classic approach of, “I’m going to get more capacity in my team, not necessarily by optimizing what we do, but instead just by hiring more people,” that has been taken away. So I think that’s the second headwind that is the offset from the tailwinds of better technologies emerging in the space that are helping teams be more productive.

Paul Lacey: Makes a ton of sense. One of the things that we test and we survey on in a couple of different capacities is, how are teams doing against some of these things that should help relieve them of these stresses? And so the implementation of automation is one of those things that we test in a variety of different capacities, and it was encouraging to see that there was an over a hundred percent increase in the number of people who said they are now using automation, which is great. That’s victory. It was actually I think 110% increase to be specific of people who said, “Hey, as of 12 months and 12 months prior, we’re now currently using automation technologies,” which is great.

But there’s still quite a few people that say that they want to use automation technologies in the near future. And there’s quite a few people… there’s actually an uptick in that too, which is great. People who say they’re very likely to use automation technologies in the near future, 20% increase over last year’s results, which is great. So people are on the cusp, they’re basically getting ready to take the plunge and start doing these things. But we’re also seeing that it must be harder than people think, because there’s quite a few people last year that said they were going to do it in 12 months and they didn’t actually do it. 12 months later, they didn’t actually do it. So what do you think is at play there, Sean? What are some of the gotchas that some of these people might be running into?

Sean Knapp: I got to be honest, the whole analysis around automation technology and use of automation technologies I find probably the most fascinating part of the report because… keep me honest on this, I’m going to go off memory from last year’s, but if I recall correctly, it was something like three and a half percent of people said they already had automation technologies in place and something like 87-ish percent said they planned on implementing it in the next 12 months. And this was a year ago. And the reason why that stood out to me so much then was like, “Oh my gosh.” So one, all the technologies that people were using to try and schedule and run pipelines today clearly does not actually count as automation, otherwise this number would be higher. And then two, rarely in that industry do you see something that such a small percentage of people have that such a massive percentage of people want.

And there’s a huge gap. And so with tons of plans to invest. And so that was what really stood out to me last year. Now, why I think that’s so interesting this year is the number’s climbed, which is great. Eight and a half percent of people say they now have automation technologies in place. Not uniform or ubiquitous adoption, but something in place, which is amazing. But if we look at that 5% increase, absolute total of 5% compared to last year, that’s a pretty long shot away from the 87-ish percent who said they wanted it. So I think that’s the, “Well, so what happened across the industry?” I mean, obviously I think everybody is feeling way more optimistic a year ago around what they could get to than what they have, constrained capacity and et cetera that we already talked about. And then I think the second is probably just most of the technology and the ecosystem still just isn’t that mature.

People want automation and they want the magic of automation, which is so amazingly powerful. But I think what people are also finding is, most technologies that are trying to help automate are still fairly primitive in nature. They’re probably more scheduling and orchestrating than they are true automation. And I think that’s what the industry honestly is going to have to go figure out over the course of the next 12, 24 months is what is true automation and how much benefit can you get to it? Because I think this is going to be very similar to how everybody talks about AI. And every company is an AI company, just like every company five years ago was a blockchain company. I think we’re going to start to get into this realm where every company today is going to become an automation company, and then folks are going to have to figure out, “Well, what kind of automation? How’s it really going to drive benefit for me?” And I think that’s where we’re going to have to see where the rubber truly meets the road.

Paul Lacey: We’ll have to expand this section in next year’s report then, and definitely start to test a little bit more on some of these definitions. Yeah, it’s always hard with some of these questions, these surveys. As you mentioned, they’re blind by design, so that means that people are just answering according to the words that they see on the page and what those words mean to them. But especially as a career product marketer, I always want to dig in and understand, “What do you mean by those words? You say you want to implement automation. What does automation really mean to you? And what do those technologies look like?” So yeah, it’s a fair call out. And I think the other thing too, Sean, is some of the things that we talk about on the show is it’s not the easiest thing in the world to automate your data pipelines.

And it takes a rather sophisticated degree of knowledge of your metadata, of the ability to segment your data sets into some sort of a partitioning scheme, to build a controller that is hyper aware of every piece of data in your property and where it is in relationship to the pipelines that may have run in the past or need to be rerun in the future, and changes that come in to those pipelines and things like that. So while a lot of people, I think, have the desire to automate, without technologies in place that allow you to do things like that, which are… aside from Ascend, which has been really innovating in this area for quite some time, a lot of those technologies are nascent or non-existent in a lot of the current platforms that people are using. And so they may have every desire to want to do this, but their tooling just can’t keep up, right?

Sean Knapp: Yeah, I totally agree. And I think that’s where the… it’s unclear if we’re going to see incremental improvement in this. I think that the industry would naturally try to incrementally add greater and greater automation. But I think over the course of time, folks are going to find probably similar to what we saw in software engineering around Kubernetes and container orchestration, at some point, folks are probably going to find that the architectures are on a more brittle foundation that would benefit greatly from a more material upgrade of the underlying foundation itself.

Paul Lacey: I think you’re talking about what people who are students of innovation would call a discontinuous innovation, right, Sean? Where you have to actually break something and do something completely different in order to get to that next level of performance.

Sean Knapp: Yep.

Paul Lacey: You can’t just keep optimizing on the path that you’re on.

Sean Knapp: Yep, yep. Exactly.

Paul Lacey: Totally makes sense. And I guess with that, that leads us to the last of the two things that we can pull out here is, there were two camps, two very specific groups of people that we saw clustering in the results. Individual contributors and team leads were kind of feeling one thing when it comes to, how do we get from where we are today to where we need to be with our performance and things like that with our data pipelines and data programs as a whole? And then the executives had different perspectives on a couple of different things ranging from how they’re going to achieve cost savings in their current infrastructure to the applications of generative AI. And we talked about that in this show before, executives tend to think, “Oh yeah, throw chatGPT at everything. It seems great. I had it write my daughter’s homework this morning and she got an A, so why can’t it run our data pipelines already?”

And the individuals who are a lot closer to technology, “I don’t know if it’s as easy as it looks,” from that seat, but we did see a lot of that. We saw some dissonance. And so maybe we can unpack that a little bit, Sean. So I know that there was some dissonance around the individual contributors and team leads are really favoring consolidation and getting rid of tools from the stack, but we actually saw the opposite preference from executives who are saying, “Oh, we actually think we need to add more tools to the stack.” What do you think is the dynamic that’s driving that dissonance?

Sean Knapp: I think we found really quite across the space a really strong dissonance as you work your way through the reporting structure from… and just so folks are aware of where we kind of group things is into individual contributors, so people who are hands-on doing the work, team leads, managers, directors, and then executives, VPs and above. And it was a pretty consistent correlation between those folks around just basically the ICs really still in the trenches dealing with a ton of chaos, far more pragmatic-minded of what are the things we got to get done today?

And the executives, especially in this era where they’re being bombarded with a whole bunch of new…. let’s call it stimuli around what the next era and the next wave is supposed to look like. And I think that disconnect highlights a tremendous challenge, opportunity as well, but a tremendous challenge for those in the middle at the manager and director level to help actually start connecting the dots for folks as the… I don’t think we’ve had this level of analysis on it before, but this is a very material disconnect inside of most organizations that I think will create challenges for those organizations in the near future, if not already.

Paul Lacey: And we saw it across the board, we saw data executives are more likely to think that basic tasks working with data are taking too long than their teams, which as you mentioned, are much closer to the work. And they’re like, “No, no, it takes this long because it’s hard,” and there’s no shortcuts here. At least there isn’t yet. So we saw a number of those things come to bear as well as just the focus on the roles are different.

And to be honest, some of this report makes me want to go find a data engineer and just give them a hug because they were more likely to indicate that their number one KPI for success was number of errors fixed in a given time period. And imagine being in a role where all you’re doing is playing whack-a-mole with stuff that’s breaking and you’re just trying to keep… you’re like Scotty in the engine room of Star Trek, of the Enterprise trying to say, “I’ve given her all she’s got, captain. I’m trying to keep her together.” And then you’re off-screen and that’s your one cameo for the show, and then that’s your role.

Sean Knapp: Well, I think it is really interesting because I do agree. I think a lot of data practitioners, data engineering teams deserve a hug right now because we’re just going through this stage where most teams have just started to explore and experiment with the modern data stack, or maybe are still even just hearing about it and haven’t even yet started with it, but are already getting overwhelmed by the sheer number of tools and the number of things they have to deal with. And in light of that, are just trying to figure out, “How do we stabilize our world?” And from an executive level, we do hear, “We want to consolidate because we’re tired of paying so many vendors to do all of these things,” but at the same time, executives are saying, “Well, we need to be pressing on gen AI. We need to create more data products and services.”

And so there’s still the same pressure while there’s a strong need to stabilize. And so I think that’s where we’re seeing a lot of these data engineering teams haven’t even got their feet underneath them yet and feel that they’re in a stable spot so that they can start to make these next big investments. So being asked to change the tires of the race car while it’s still flying around the racetrack. And one of the ones that really popped out to me too, when we think about the number of tools, a couple of the starkest contrasts I saw were the pressures between the hands-on developers and the team leads. And that discrepancy was interesting because oftentimes, that’s the connective tissue between the team and the management structure, which was what we saw was most individual contributors, hands-on developers, were far more likely to say, “Hey, my number of tools needs to go down and functionality is going to reduce, and I’m okay with that.”

Whereas when we looked at team leads, we’re far more likely to say, “The number of tools is going to go down, but the functionality can’t be reduced.” And so I think part of that demonstrates this disconnect too of, “Hey, we need to simplify our world and we need to actually stabilize, and we’re okay if we lose some functionality,” whereas some of that pressure of, “I still have to deliver for the business at the same time.” And I think that’s where we’re seeing some of this rubber meet the road. Not to continue the analogy, but I think that’s where folks are in a bit of a hard spot.

Paul Lacey: Yeah, definitely. And then there’s the other side of that coin too, which is, you mentioned the cost reduction part of this too, and obviously consolidating onto platforms saves cost and kicking tools out of the tool stack obviously saves cost. There was a bit of a disconnect between executives and individual contributors and basically the rest of the team with regards to how that can be accomplished. And the rest of the team was much more likely to say, “We’re going to optimize our data pipelines so they consume less and we’re going to keep everything running as it is.”

And the executives were far more likely to favor saying, “Let’s just cut out the number of data sources that we’re working with and that should save us some money, right?” And it’s very much like they’re probably looking at the top line of just, “Gosh, how much are we paying for these ingestion tools that are charging us by row or by volume of data that we’re bringing in? We got to just get the volume of data down somehow. How can we do that?” Versus the team that’s closer to the engine that’s saying, “Actually, the engine can be running 30%, 40%, 50% more efficient so that we don’t actually need to change the type of fuel that we’re putting into the engine. We just need to change the way that the engine is burning through that fuel and then we can save that money in other places and stuff like that.” And that’s a big thing for people to get aligned on, right?

Sean Knapp: Yeah. Oftentimes, the engineers, the mechanics are like, “Hey, we actually just haven’t had an oil change in 30,000 miles, and maybe we should.” And I think the challenges for a lot of these teams… if you’re on a platform that’s not giving you the visibility, you’re not given the right observability data around the consumption and where the hotspots are in your data pipelines. And moreover, if you’re not actually leaning on… and this is I think the chicken and egg, if you’re not leaning enough on automation, so an automated system can help apply those optimizations for you, the fundamental challenge you have then is where do you start?

I think we see this across the industry. A lot of teams are saying, “Hey, I can optimize and tune this. We don’t have to make those business trade offs.” But where you start, whereas so many big data pipelines are just these black boxes that you just throw a bunch of stuff in and you get it out and you get some nice observability details from your data cloud, like from Snowflake and Databricks, if you have a bunch of input on, “Hey, here’s how much work the query or the job took.” But data pipelines themselves are these collections of hundreds of different queries or little mini jobs. And so where you find the hotspots in there requires a deeper level of detail that many people just don’t have access to.

Paul Lacey: True, very true. And certainly not from the C-suite or the executive chairs of the folks that are trying to govern these teams.

Sean Knapp: Yeah.

Paul Lacey: So yeah, that’s a great shout out for yet another category of tool, Sean. So yeah, let’s go buy another tool, right? Or just buy a platform that has that baked in like Ascend does with our data pipeline automation, the full service of capabilities to do all that and more. So yeah, leave it to the marketer to make the shameless plug at the end of the program.

Sean Knapp: Of course.

Paul Lacey: Had to do it. I had to do it. It was just hanging right there. But yeah, no, so thanks for that, Sean. That’s great analysis and unpacking. I’m sure there’s a lot more that’s going to jump out at us from this report. We can bring some of this data into some of our future conversations around some of the trends that we’re seeing as well, which is great. Some more to come. But the DataAware Pulse Survey from 2023 is available today. So go to www.send.io and click on the Pulse Survey tile that you’ll see on our homepage. Go ahead and download it and give it a whirl. Let us know if you have any questions, and we look forward to talking about this on future shows. So thanks everybody for joining and most of all, thank you, Sean, for your great insights.

Sean Knapp: Absolutely. Another great conversation. Hopefully, everybody’s enjoying it.

Paul Lacey: I’m sure they are. We’ll see you next time, folks. Take care.

Ep 24 – 2023 Data Engineering Trends

About this Episode

Transcript