Unifying Spark Ecosystems with Flex Code Data Connectors

In this episode, we sat down to chat about Ascend.io’s new Flex Code data connectors and how they will help organizations speed—and simplify—data ingest for Spark-based data systems.

Transcript

Leslie Denson: I think most folks can agree that rapid low-code or really flex code data ingest is still on the wish list of most data engineers, especially ones working with the Spark ecosystem. Today, Sean Knapp and I will chat through Ascend’s new flex code data connectors, how they can help take the pain out of data ingest and discuss just why the flex code piece of it is so important in this special episode of DataAware.

LD: Hey everybody, and welcome to our special episode of the DataAware podcast. I am here with, once again, our founder and CEO Sean Knapp. Hello, Sean.

Sean Knapp: Hey, Leslie.

LD: So this time it is a very special episode because we… Well, while most of our episodes will just be very focused on the data engineering industry and what’s happening, we want this one to focus a little bit on this exciting announcement that we have coming out, so we’re gonna be a little bit more us focused today. Don’t expect this every time, but we have got some really cool stuff coming down the pipe, so why not get Sean and I together to talk about it. Just makes sense. So thanks for joining me today, Sean.

SK: Absolutely, I’m excited to start talking about it.

LD: I know, it’s always fun when… You can send on an email, you can put out a press release, you can put up a blog post, but it’s fun to talk about ’cause it’s just more excitement, it helps people understand a little bit better why some of the stuff that we’re doing is as cool as it is, so let’s get into it. So what we are announcing are new flex code data connectors for data ingest, which is something that I know has been a big problem in the Spark ecosystem, or helps solve what has been a big problem in the Spark ecosystem. So talk to me a little bit about it, talk to me about what the issue has been and why we came to the point where this was something we wanted to create and release to the wild.

SK: Yeah, it’s something that we encountered a ton across our customer base was this notion of, we want the best of all worlds, where what everybody would love to have, 95% of the time is like a super simple, easily configured, whether it’s through a no-code interface or somewhere… Or through an SDK call, that’s just like, here’s some data, go find it, ingest it and go do all your magic, and 95% of the time, you can do that. And then when we kept watching everybody, it was like this challenge of, “Hey, so about that other 5% of the time where my data is kind of special or I want to do something slightly different as I’m ingesting it, what do I do then?” And we’ve solved that problem historically by saying, “Hey, we give you actually a framework for just writing your own custom Python, it’s more Lambda function style, we’ll orchestrate it for you, we’ll scale it for you, we’ll do all the other stuff.” But what we noticed with that was there was still a lot of missing between those two. We had started to get what we thought was 70-80% of the way there towards what really was a connector framework notion.

SK: What does a framework do when it’s connected to data sets? It’s looking for partitions of data or connecting two streams, it’s profiling and it’s reading records or by just doing these things. And so what we’ve really done is we said, well, let’s go formalize that, but let’s actually put it at multiple layers where you can write a bunch of code if you want, but once you write code, you can also then say, “Hey, here’s actually the configuration of my connection type, so I want Ascend to go even dynamically generate this UI on top. I may have solved this problem once, but I have 30 people inside my company that are going to want to go connect to things like this. So let me go publish my connection type for everybody else to go use.” And so the reason that we thought this was so cool was it gets you the best of all worlds where all the standard connection types that you wanna go use, whether it’s to a lake, a warehouse, an API, we’re rolling out a whole new set of those for people, but then when you get into the new unique aspects of connections that you really want to go create, you can flex your coding. Ha ha, see what I did there?

LD: I do.

SK: You can go flex your coding skills and go deeper and lower level while still then actually packaging up that up and productizing it for the rest of your teams.

LD: Which goes really nicely with what we’ve talked about a lot internally, and what… It’s just talked about a lot in general, ’cause it… I think I had said this morning to somebody, it’s the holy grail of data at this point, it goes a lot to helping democratize it and just making it [chuckle] easier for everybody. Maybe you have a data engineer who works on it the first time, but then don’t need that the second, third, fourth, 50th time that somebody is trying to access a particular stream or resource or whatever it might be.

SK: Yeah, and it’s funny too, ’cause when we look across different worlds, the neat data warehouse world, we see a lot of people have done a lot more work around connectors than what we’ve seen in the data lake and to the data engineering world. It’s really hard to go and say, “I want to connect my Spark to Salesforce or Facebook API,” or so on. You can definitely shoehorn that stuff in, but nobody’s just said, “Hey, here’s actually really easy, ready to use connection frameworks for this,” whereas in the warehouse world we have those, and so we wanna be able to bring the best of all worlds to the data engineering ecosystem.

LD: Yeah, which makes sense. Nobody wants something to break in the middle of the night and it takes them 75 hours to get it fixed when you have something like this. Then to your point, it’s something that in the data warehouse space, they’ve had but we just haven’t gotten there yet until now, in the lake space. So, talk to me a little bit about how this fits in with just the Ascend product in and of itself, so what can our customers or people who are coming on the platform expect?

SK: Yeah, what they’ll start to see almost immediately is a larger number of out-of-the-box connection types. So they’ll see a lot more connection types to those APIs, warehouses, databases, and likes. The second thing they’ll actually see is a lot more configurability inside of those connection types, through this no-code interface, so they get a lot more options now that we have this really great new foundation, it makes it so much easier to add additional levels of configuration that still conform to the Ascend paradigm of declarative data pipelines.

LD: Right.

SK: So they’ll get to see a ton of that happen. The other thing they’ll start to see even after this, is the ability to then create their own new custom connection types and use those for their entire organizations. What we tend to find is the bigger the companies, the more they have a central team who is trying to solve that one problem for everybody, right? And in many ways productizing patterns, right? And so the Ascend platform can productize a ton of the patterns we see across customers, but internal teams oftentimes have patterns that are unique to their business, and they’re really jumping on and tackling that productization of those patterns specific to their business. For example, you have a very custom model of how you structure your custom fields for Google Analytics tags or Omniture tags, or how you store data inside of Salesforce, it won’t conform to necessarily the global spec, if you will, but your company has a standard pattern, and so you as that data engineer can create this proper connector for the entire of your company to go use.

LD: Yeah, I can imagine that that’s going to be a massive… I think they all kind of roll in together but massive time and resource break for teams that are already incredibly busy, I don’t think… It’s no surprise to anybody who would be listening to this, data engineering teams and just kind of data teams across the board, they’ve got a lot going on these days, so anything that can save a little bit of time and a little bit of pain is probably fairly helpful.

SK: Totally agree. I mean, that reminds me of our survey we did a few months back, right. Where was it 97% of data teams, data engineering, data science and data architecture teams, 97% were at or above capacity. Only 3% had time to essentially invest in making their lives easier. And so the more that we can find these global patterns that we can offload and productize for people, I think the better off the entire industry is.

LD: Again, it just helps across the board, if you can offload some of that from your data engineering team, that means the data science team has what they need faster or can get access, the analyst team, that means the business can react faster. I mean, it is kind of amazing, and this is not only done in data teams, it’s not only with data, but you can see it a lot more with data where if you accelerate something, or you streamline something on the kind of the back end of it, the rolling effect that that has moving forward on the entire business. And also, to be fair, it just makes life more fun for folks. We were talking to somebody the other day who said, “I’m just looking for a way to make it easier to do the things that are not fun, so my team can actually get to the things that are fun.”

SK: That is a great way of putting it.

LD: Yeah, [chuckle] which is a really like… It was actually, it was just very kind of blunt and out there, like some of the stuff is just not fun, I don’t wanna… We don’t wanna do it, we wanna get to the stuff that actually is fun, so let’s figure out how we automate the not fun stuff and get to the fun stuff.

SK: So you’re saying data ingest is not fun, ’cause I’m kind of with you on that one.

LD: Somebody probably out there finds it to be the most fascinating thing, and I am all for that person if they…

SK: God bless them. We’re hiring, by the way.

LD: Right. If you do find that fun, we are hiring, reach out to us. Yeah, I just… I think it’s… To that point, anything that can get people to the actual fun part of their job faster, and as you said, productize the parts that just aren’t or productize the parts that can be to just make the whole process easier, less of a burden, get these guys back to get it to where there’s not 3% of people who… The 3% of data engineering teams that they can make their life easier, 30% and grow that even more. Let’s do it.

SK: Yep.

LD: What else about this? I mean, we talked about it a little bit internally, again, it’s always fun to listen to people talk about it because of how excited they get about different things. Is there anything else that the listeners should know about this particular release?

SK: Yeah, I think the other thing that’s become super relevant to this release is you still get this tremendous benefit of not just this framework, obviously, that makes it easier to both use more connectors and get more features out of those connectors. Honestly, that’s the part we’re most excited about ’cause it makes it easier for us to write more features for folks, but on top of that, the other really big benefit for this is… Well, two I would say, is one the scalability. We’ve actually written this as a framework and foundation that runs on Spark, so oftentimes you’ve found a lot more of these frameworks will run in adjacent infrastructure. In fact, our previous connectors did just that by running these on what are highly reusable Spark clusters, we get that same efficiency from a latency perspective, but mass scalability because we can now process huge amounts of data much faster too.

SK: So we get the best of both worlds of doing large numbers of very, very small pieces of data, but also those smaller numbers or even actually bigger numbers of very, very large chunks of data, so that’s one is that all of that scalability and the same orchestration and parallization that you get with all of your other connectors inside of Ascend. The second piece is tied to all of the advanced profiling, reformatting, persistence of data that we do and do incredibly well inside of Ascend, just with all of our other connection types. We actually are connecting to these data sets, and as we ingest data, we’re automatically analyzing all of it, we’re profiling all of that data, we’re converting it to parquet files for you and storing it inside of the local object store. All of these things going back to productizing patterns, all of these things that nobody should ever have to worry about anymore, you just get for free. And it doesn’t really matter if it’s 10 small pieces of data coming out of an API or literally millions and millions and millions of files coming out of an object store. You get all that exact same benefit, for you as a user of the platform.

LD: Yeah.

SK: You just write code and you just work with data, and that’s the cool part.

LD: I think that’s what most… And again, we’re having a conversation with somebody this morning where they were like, “They just… Data engineers just really need to love data and they just wanna be able to work with data.” It’s a little different from sometimes a software engineer in the fact that they need to just love data. So yeah, to your point, helping them get to the point where they can just work with that faster and worry less about the backend of it, is always going to be super useful and super helpful.

SK: Yeah. And it even reminds me, in pairing that with your quote about making the not fun stuff easier and the fun stuff… So they can get to more of the fun stuff faster. It actually… Those two paired together reminded me… Twice this week alone, I’ve seen two of our formal trials of the product. Two formal of these three to four week POCs we do with customers… Twice in this week alone, I was on calls where we’re going through and saying, “Hey, this use case, this use case.” They’re trying to prove out different use cases, and you get to some of the stuff that has been causing tremendous pain for the customer with their previous architecture and what we’re finding is, as they get really comfortable, week two, week three of these trials, all of a sudden they’re like, “Yeah, we’ve already seen how automated Ascend does all this other stuff, we’re just going to assume that you do this.” Which we do, which is great. “We’re just gonna assume you do this ’cause actually what we really want to do is we want to go get and build more of these pipelines.” And usually we’re like, “That’s not part of the POC, we love that you wanna do that and we’re supportive of this, but don’t you wanna validate this thing over here?” They’re like, “Nope, that’s the not fun stuff, we assume you already do it, we just wanna go build more pipelines and work with data.”

LD: That’s awesome.

SK: Which is cool.

LD: That’s really cool.

SK: And I think we see this pattern a lot, and especially as more people use our product too, we earn a lot of that trust.

LD: Yeah.

SK: And can get them out of that muck, right, out of the not fun part, and then to just go build more stuff with data.

LD: That’s really cool, and they talk a lot about these things. Now again, I’ve heard it put in that way, so that’s really fun.

SK: Yeah.

LD: And that’s what you want, there’s nothing better than something that makes your job more fun. Something that makes your job, not necessarily easier, although we do that, but it makes your job more fun and you’re excited to get in and do the things that you’re excited to be like, “Yes, I’m gonna go in and build data flows today. Woot. Got that. Check that off the list.”

SK: Yeah.

LD: That’s awesome.

LD: Well Sean, thank you. I appreciate you taking some time to chat about this with me. And if anybody is interested in learning more, or maybe getting started, you can certainly do that at Ascend.io. We can either set you up to to talk with one of our data engineers, they can walk you through it and you can see it, or you can get started with the free trial. Appreciate it.

LD: Hopefully that gave you an informed intro into the new Ascend flex code data connectors and how they may be able to help you and your team. But as always, if you’d like to hear more, just visit us at Ascend.io. Welcome to a new era of data engineering.

Ep 2 – Unifying Spark Ecosystems with Flex Code Data Connectors

About this Episode

Transcript