Ep 20 – Saving Money By Driving Efficiency In Data Engineering

About this Episode

Sean and Paul unpack the latest research on the benefits of automation and why platform consolidation happens so often during market downturns. They explore how data engineers can improve their productivity with data by 700% and save at least $156k on their data stack costs at the same time. Tune in for more!

Transcript

Paul Lacey: All right. Welcome back to the DataAware Podcast, where we talk about all things related to data engineering and also data analytics, so how can you become more data aware in your organization? I’m joined by Sean Knapp, the CEO of Ascend, again. Sean, welcome on the program.

Sean Knapp: Glad to be back again. This is getting to be a fun recurring theme for us.

Paul Lacey: Yeah, exactly. And just for those of you that are tuning in maybe for the first time, I should introduce myself as well, I’m remembering. So my name is Paul Lacey. I lead product marketing here at Ascend, and this is the program where we talk about things related to data engineering and in particular, data pipeline automation. So how can we leverage some of these new technologies that are coming out to try to help make the job of data engineering easier to some extent so you can go work on things up in the stack, things that are more cutting edge? And Sean and I were just talking about there’s lots of cutting edge things that are out there for people to go work on, and so what we want to do at Ascend and what we want to do with this podcast in general is just bring some of those ideas and concepts to bear so that folks can learn from it, potentially do it themselves, whether that’s with using tools that Ascend provides, whether that’s with some other things that they do.

Our ultimate goal is just to help everybody understand what’s possible out there. And speaking of understanding what’s possible, Sean, there’s a great report that came out this week about Ascend in general, but we can extrapolate from this report some general trends and things that everybody can think about when it comes to trying to automate some of their roles. The analyst agency known as ESG or Enterprise Strategy Group under the TechTarget umbrella has done what they call an economic validation report on Ascend’s data pipeline automation platform. And there’s a lot of really interesting findings here in this report that we’re going to unpack here today and talk about what do they mean, how did some of these things get achieved, and that sort of thing. But yeah, it’s super exciting to see this land and some of the great insights that came out from some interviews of our customers and general research on the market and things like that as well. So Sean, have you gotten a chance to dip into this report at all? Any high level thoughts?

Sean Knapp: Yeah, a bunch. And I think I’d say the first highest level thought is it’s surprising and not surprising all at the same time. And I say this even from the Ascend viewpoint. Clearly, we’re confident in the sheer power of automation and the impact that we in automation can have on the broader ecosystem. I’d say the unsurprising part is actually the impact. We get to experience this every single day, and so for us, it’s not surprising. The reason why I think it’s so surprising is really tied to that timing and I think the ESG folks did amazing research on this and really thankful that they took the time to go really deep into the details and do a lot of the analysis here because I think the part for me that’s surprising is that this is surprising for so many people. I don’t think it was surprising for the ESG folks as well as their industry experts when they get to see this pattern emerge over and over and over again in different spaces.

But we talk about it, and I won’t do any spoiler alerts for our upcoming survey results, but historically, I know it’s a podcast, nobody else can see me air quoting the historical part, but historically, we have always seen a couple of these major trends. Data teams are at or over capacity, they don’t have enough time to focus on the things that really propel them forward. The demand for data pipelines is rising faster than their actual team capacity itself. And so we see a lot of these and we’ve even started to ask questions. Do you currently invest in automation? Do you have plans to invest in automation? And so there’s this huge demand pent-up for more automation and I think the reason why all of this I think is the unsurprising and surprising part at the same time is, well, of course automation gives us these amazing benefits. How could it not?

We see this everywhere else across our landscape outside of data, so of course we get these amazing benefits with automation, yet we’re still, I think in all humility, we’re in such early stages of development from a data engineering, analytics engineering, whether you want to call it, ETL or EOT or pipeline world, where we’re moving around data and we’re transforming data and we’re creating new models and semantic layers, et cetera. So many different words for really the same thing, but we’re still so early in the maturation cycle that there’s this still strong need for automation and that’s why we see such huge benefit being gained from it.

Paul Lacey: Absolutely. Also, it’s one of those things that you always know is happening, but when you see it quantified, it stops you in your tracks a little bit. One of the findings from the report was that the Ascend platform, but really data pipeline automation in general, improves data engineering productivity by 700%. That’s a 7X on your productivity if you’re a data engineer and you put automation in place and then you can go work on other things. You can go work on the cutting edge machine learning models or MLOps, pipelines, things like that. So the kind of stuff that isn’t mundane so you can up level yourself and spend 80% less time building data pipelines as well. That’s just incredible when you actually see it laid out like that with hard figures.

Sean Knapp: Yeah. And that’s why I described it as surprising and not surprising all at the same time is I sit back now. Literally, I’ve been preparing for my 20-year college reunion. I’ve been getting all these emails. Like, oh wow, so I’ve actually been doing this for about 20 years and let me think about some of these patterns we’ve seen. And it follows these same patterns. You think back to, well, yeah, what was it like before we had high level automation for container orchestration, pre-Kubernetes era, or what was it like before we had proper DevOps practices and CICD tools and so on? The not surprising part is the benefit.

I’d say the surprising part is when we talk about numbers like this, as you said, a 700% increase in engineering efficiency, the surprising part to me is we’ve all as an industry have just largely accepted that that rampant lack of efficiency was good enough and it’s amazing what teams and individuals, we all can get used to. Oftentimes in those early stages, you and I have talked about the innovation cycle across spaces, and as you go from the mass innovation to fragmentation in a space, you start to hit these high points of inefficiencies until you get to more standardization. And I think we’re now swinging back into this maturation phase of, okay, we all believe there’s so much value and potential here. How do we efficiently and properly with leverage wield these amazing new capabilities?

Paul Lacey: Yeah. One of the things that comes to mind for me is it’s the whole AI, machine learning paradigm of explore versus exploit. So every market and every phase has an explore phase where you’re basically trying to figure out with Bayesian techniques and whatnot, am I at the global maximum or is it a local maximum of the function? And then at some point, you’ve got to just consolidate. You got to say, okay, I’m just going to exploit for a while. I’m going to consolidate down. I’m just going to decide this is the maximum I can find right now, so let’s just hammer on that function for as long as we possibly can. And then that’s where the automation comes in, right? It’s like, let’s just go do that thing at scale.

Sean Knapp: I love that. Exploit sounds pretty, I’m thinking harvest. Explore and exploit, it’s catchy. But yeah, you’ve done all this amazing work. You’ve put in the hard stuff to figure the things out. Go get some value out of that for a while. I agree. I think that matters a ton. I think we’re entering into this era. Now, this quote, I was chatting with some investors last week and they had this really, it probably was just a passing comment for them, would be my guess, but it’s really stuck with me since last week, which was many CIOs today are now worried that they’ve over-enabled, overpowered their organizations as we’ve had this mass explosion of tools, mass fragmentation. So we have all this potential, but the problem is without proper productization, without proper structure, without proper discipline, all of that actually equates to cost. And so we’re seeing the pendulum start to swing back around, hey, how do we dial in costs?

How do we get better efficiency? We’ve been exploring forever. Just for a quarter or two, can we extract some value out of all of this work and investment we’ve been putting in? And so I think that it’s been this interesting shift back in the market just towards, okay, this is great. We all believe that data’s so important and we all believe that we can build these amazing data products and that this new wave of innovation on data platforms is really going to help take us there. Let’s actually make that a reality now. And I think that’s where we’re starting to see this, what I think is amazing laser focus on outcome and productivity versus the earlier stages, which are far more explorative, as you described.

Paul Lacey: Absolutely. Yeah. You see this cycle anywhere you want to look. You can see it in the maturation of an organization, for example. I’ll give an example from Ascend. We have I think two or maybe even three project management tools that we’re using in various parts of the organization, and we just recently talked about that at our offsite. In marketing, we have our favorite tool. I won’t name names, but we have our favorite tool and the rest of the organization is standardizing on another one, and we finally just held ourselves to account to say, should we really be paying for two tools?

Our favorite actually has a little bit of marketing specific functionality, but the rest of the organization’s one is not half bad, so why don’t we just move over there and just let’s consolidate, right? There’s just efficiencies that come from having everybody to some extent on the same place and the same platform where we can coordinate, where we can tie into what’s coming down the roadmap from an engineering perspective and we can say, oh yeah, so when this feature launches, then we should actually kick off a workflow that says, hey, let’s write a blog about it. Something like that.

Sean Knapp: Well, and I think you highlight two costs, right? One is the actual vendor cost, which especially for some of the marketing tools, it’s less material than what we see a lot of people spending on data infrastructure, but it’s still a material cost, and you hate paying fees to duplicative products. And I think the second piece that you highlight that is so important is this whole productivity. And we see this at Ascend a ton, too, not just in our own internal operations, but actually in just the product itself. Once you have all of your data and all your teams working off that same unnamed super awesome project management tool, you can do all these cooler things because all of the metadata is in the same place and you can automate so many more things and your team’s efficiency and productivity goes up because they’re familiar more so with that SQL tool.

And so there’s this residual benefit that you get that I think in our over-tooled ecosystem where we get such hyper fragmentation, we oftentimes in that pursuit of that I want the best tool for this little thing and this little thing and this little thing, but we forget that that best of tool for niche approaches oftentimes comes at a cost and it comes at the cost of the integration tax and the loss of macro maturity because we’re hyper-optimizing for small local functions versus globally optimizing.

Paul Lacey: Yeah, and there’s a lot of that in this report as well, which is why I think it’s so fascinating how you see this pattern over and over and over again across any industry that you want to look at. But in the data integration world, I think one of the call-outs was that without a common platform like Ascend that basically unifies the data pipeline operations from end to end, engineers that were surveyed for this report spent 25% of their time on whatever you want to do. I think it was an annualized basis, but you do weekly, monthly, whatever. 25% of their time was spent integrating distinct point solutions, just trying to get them to work together to build a data pipeline function, which is just incredible. Imagine having a quarter of your time back every week to work on other things.

It’s mind-blowing and we certainly see that in some of the other data as well that’s in this report. They had actually looked at the hours spent required to achieving certain objectives and they found it falling by pretty much half for data engineers, I think a little bit more than half for data analysts because they’re just able to do so much more. When you can put automation at their fingertips, they’re just so much more productive. Analytics engineers I think was the biggest call-out because that dropped by almost 300, 400% in terms of the amount of effort. You’ve got someone who understands the analytics, the engineering, and then you put automation at their fingertips and they’re like, oh man, I know what to do with this, and they start cranking with what they’re able to achieve.

Sean Knapp: Yeah, it’s so impressive, too. And the thing that I think is really quite valuable is when we think about how much more people can achieve in these compressed amount of time, that also highlights how much more is actually just truly achievable because it’s oftentimes not even the inverse. People generally spend a quarter or half their time in meetings and so on, and so if you can actually give somebody even 25% more of their week back, that may very well be doubling of the output for that person. Just think of what they can actually accomplish when you give them legitimate hours back in the day where it’s just completely free time. From a finance perspective, it’s just pure margin. From an engineering perspective, it’s just pure opportunity to go build cool, new, awesome stuff.

Paul Lacey: Mm-hmm. And it’s also, if you think about your typical ebb and flow of a week, too, not all hours are created equal. So there’s times when you are at your peak focus, your brain’s cranking at 150%, you’re just really in your flow space, and then if you’re interrupted because some integration broke and your dashboard is offline as a result, or your thing that’s going to run tonight is not going to run, and then somebody’s going to wake up tomorrow morning and be upset because they’re seeing bogus data or something’s not there, that just breaks your entire flow and you lose a lot more in productivity than just the hours spent by saying, okay, let me get pulled into this thing for a little bit, let’s go over there and fix that and then go back. It’s an interrupt.

Sean Knapp: Yeah, exactly. Exactly. Yeah, that’s why I’m so excited. The sheer leverage folks get, I think it continues to pay these amazing dividends. And one of the other things, too, that I think really is powerful and starts to surface out of this report, too, is the ability for teams to be hyper-productive, but even with diverse skillsets. Oftentimes you find, and this is something that’s at a slightly more subtle level in the report is oftentimes you find a product is built for a data engineer or a product is built for an analytics engineer.

To have a product and a platform that can benefit both data engineers and analytics engineers, why I think that’s so powerful is now you get these two amazing sets of skills that really can collaborate together, and in many ways it’s because they’re focused on data as the interface and data as the product as opposed to the language or the code or the tools. That’s one of the things that actually gets me so excited, too, when I think about what does the evolving face of data engineering teams look like, I think it starts to be these really polymathic, incredibly diverse skillset folks who can contribute at multiple parts of that data lifecycle, which is pretty darn cool.

Paul Lacey: It is, and I think we see that in other industries as well. I hate to keep bringing this back from data engineering, but it’s so interesting when you compare it with some of the things that are going on with GenAI and other things like that that basically allow someone who might not be the most polished writer to then go into ChatGPT and say, hey, could you write me a blog article about blah, blah, blah, and then go back and forth a few times and then get a reasonably good first approach of an article that just needs a little bit of editing and then they’re off to the races.

So it basically is enabling people to move up stack a little bit and be less of, hey, let’s bang on a keyboard for a few hours and lots of red lines and changes and stuff before we get something that’s even halfway decent to like, hey, let’s start with something that’s halfway decent and then let’s do the thing that only we can do, which is really optimize and polish this and get this up to the standards that we’re looking for. You see that across a number of different industries.

Sean Knapp: Totally.

Paul Lacey: The other thing you mentioned earlier about the cost savings of consolidating into a single platform, taking all these different point solutions that have their various niches, but maybe not their ability to drive efficiency. And we did see that come out a lot in the report as well. I think that the report did a model of a medium-sized organization and found that compared with the tools that are on the market today and price points of those tools versus a unifying platform approach in a single vendor like Ascend, someone can save about 156,000 a year in just the tool costs, not to mention the opportunity costs and all the other productivity costs and things like that that stack up. And that’s pretty impressive as well to think about. That’s hard money back in your budget every year that you can spend on other things.

Sean Knapp: Yeah, I think so. I think this is one of the byproducts we’ve seen of the hyper-fragmentation in the space, which again, is a byproduct of it still being pretty new industry is there’s all of these technologies and all these vendors that are out there, but you’re paying taxes to so many different entities. And what’s interesting is to me, it feels like back 10 years ago when we were all so excited about unbundling of your cable subscription, and I was like, ah, that’s fantastic. I’m cord cutting. I’m going through my own personal digital transformation as a media consumer and I’m cord cutting and I’m just going to get the individual-specific packages. And now I turn on the TV at home and I have my little Google Home TV Chromecast, and we have an Apple subscription, a Hulu subscription, we’re still paying for Comcast internet, and I have eight other subscriptions of HBO and Showtime and all the other things.

And you start to add it up and you realize, oh, this might actually be costing me more than my original. Now, I like all the access, all the options and so on. It’s harder to actually get some things done at times, but I think we’re in this stage, too, as well in the data spaces. Man, we have all these tools. Just think of as a human to get your job done on a daily basis, how many of those different tools do you have to go log into? How many of the different interfaces you have to understand? How much integration do you have to do? Apparently 25% of your day-to-day just to get all these systems integrated so they can share the metadata so you can do the things. Just seems like so much stuff. And we were talking before the recording, I was out in Orlando speaking at a conference at Disney’s Data & Analytics Conference yesterday, and it was one of the things that I was highlighting there, which was, man, we see so many people coming in, and you asked that basic question, so what are you doing with data?

And boom, the first answer you get is architecture diagram of tools. And to me, that always strikes this certain point, which is ah, that’s the things you’re using to do things with data, but that’s not actually what you’re doing with data. The right answer or the appropriate answer should be here’s what we’re achieving with data. We have a highly targeted or a highly detailed customer data platform where we’re doing highly targeted email campaigns for customer engagement or something, something, something. How you’re doing it is a different question, which is also a perfectly viable question, but when you ask somebody what you’re doing with data in this current hyper-fragmented landscape where you’re spending all this time with all these tools and paying all these vendors, it’s amazing how that completely shifts your focus to doing things with data means working with tools versus what is the thing you want to achieve?

For me, going back to my media example, I just want to turn on the TV and watch a movie and that’s the outcome I want to achieve. And I think for most data engineers and analytics engineers, the outcome you really want to achieve is you want to go create a data product. The more we can actually get them focused on that and not be paying the tax to all these different vendors and do all these different integrations, the more we actually free them to deliver that outcome.

Paul Lacey: It’s a good activities versus outcomes reframing, refocusing, right? I think sometimes it can feel like you’re so close to the machine and you’re so intimately involved with the design of that machine, the operation of that machine. It’s like a mechanic and a car. The mechanic understands so many intimate details of how the car is running, why the car is doing what it’s doing, all this kind of stuff, but the average driver just wants to get from point A to point B. They don’t care. They want to push a button, car turns on, put their foot into the gas pedal, it moves, and then they get to where they’re going, they lock the doors, and they walk away. And sometimes we can be guilty of being the mechanics of the data stack when really sometimes it’s time to just let the autopilot take over and let’s go drive some places, right?

Sean Knapp: Yeah, totally. I’m with you.

Paul Lacey: Yeah. I guess so as we wind down here, Sean, any high level takeaways? Did anything strike you as like, oh my God, of course this happened because blah, right? Like because we see teams are doing X, Y, and Z, and because we know that automation is really impactful in this particular area, or is it just like as we were saying, it’s across the board? I’m curious if there’s anything that really stood out.

Sean Knapp: I think at this juncture, I haven’t watched this space and this industry evolve for so long. For me, I think we’re hitting this point in time where the pain is becoming particularly acute and it ordinarily wouldn’t feel quite as pronounced of a pain around cost productivity, efficiency, except for when we look back across the last few years in the economic background, it’s amplified. It’s actually set us up for this interesting point in time in the data industry in particular, which was we had decade of boom time, but a couple of years of incredible spend and growth across the space. And we’ve seen that, and it’s certainly in the public company revenue growth and so on that’s already out there. But what happens is with this incredible push, you tend to get less efficient execution, and that’s a very natural thing.

And then you put that against the backdrop of so many more tools coming out, which then tends to create a more complex space. And what happens then with a really strong economic pullback, which is just a complete hard whipsaw, it’s like we’ve been accelerated as an industry, accelerate, accelerate, accelerate, and then all of a sudden the market just cranks on the brakes really hard. And that doesn’t necessarily mean that everybody who’s investing in data should stop investing in data, but we’re resuming back to, I think more appropriate levels and trends. It feels like this really hard correction. And what happens, ignoring the economic pieces of this, what happens from a, hey, we’re just a data team trying to build some data pipelines is all of a sudden the focus is like, okay, okay, it was go and just everybody pedal to the metal.

We’re going as fast as possible because we just got to go figure this stuff out because everybody else is racing to figure it out. And now it’s just whipsawed back into the, all right, we got to course correct a little bit. We got to stop lighting money on fire. We got to dial in our operations. And so you’re seeing all that flow through companies right now, big and small, and people are naturally turning to what they have for the last decade or two when they look to tune and optimize, which is automation. And it’s served incredibly well. Everything from, we’ve seen RPA do amazingly well in other domains to infrastructure automation do amazingly well around containers and low level infrastructure. And so it’s not surprising at all that this is the data industry’s time to embrace better automation.

Paul Lacey: Yeah. I couldn’t agree more. When you start to think about how can you do more with less, right? You have to think differently, and I love those kinds of problems. Actually, it’s one of my mantras is you have to get to a point where you think different and you think different about what needs to be accomplished. And sometimes it means being overwhelmed by the challenge in front of you, but then that makes you rise to the occasion and consider some things that are outside the box. Love it. And at some point on this show, Sean, we definitely should get into a little bit more about how you can automate, what are the components of a stack that achieves high levels of automation? Because I know for a lot of people, that might even seem like a buzzword, right? It’s like, oh yeah, it’s a no-brainer. Of course you should automate. I automate making my coffee every morning with my Nespresso.

What does it actually mean for a data engineering team to automate their data pipeline? So I’ll just throw that teaser out there for folks. We’ll definitely dip into that in one of the future episodes, so keep an eye on this channel for that. But yeah, Sean, I think we’re at time. So thanks a lot for your insights, and it’s been really fun unpacking this report with you. For those that are listening, this report is available on our website at www.ascend.io, so go ahead and cruise on over there and download it and read it. And if you have any questions, we’d be more than happy to help you unpack this and figure out how it can impact you and your team. But for now, let’s go ahead and sign off and say thanks everybody for listening, and thank you, Sean, for joining.

Sean Knapp: Awesome. Thanks for having me, and thanks for doing this again, Paul.

Paul Lacey: See you next time.

Sean Knapp: See you next time.