Podcast: Expecting Great Things from Data w/ James Campbell

In the latest episode of the DataAware podcast, Sean and I chatted with James Campbell, CTO of Superconductive—the team behind Great Expectations, about some of the intricacies of data validation as well as what it’s like to build an engaged community in the data realm. Learn more about how to overcome these challenges and achieve data alignment—in this episode of DataAware, a podcast about all things data engineering.

Transcript

Leslie Denson: People expect great things from their data and few people know that better than James Campbell, the CTO of Superconductive, which is the team behind Great Expectations. Sean and I had the chance to sit down with James about folks’ expectations of data and the amazing part of communities that rally behind technologies and more during this episode of DataAware, a podcast about all things data engineering.

LD: Hey, everybody and welcome back to another episode of the DataAware podcast. I know we took a little bit of a break while things were crazy busy here at Ascend, and they were fantastic, but we have a really, really awesome episode to get us launched back in to this new “season” of the podcast. I’m super excited for this one today. I am joined, once again, with Sean Knapp. Hey, Sean Knapp. How’s it going?

Sean Knapp: It’s going great, Leslie.

LD: Good, good, good, good. I hope you’re as excited as I am.

SK: I am. I’ve been itching to get going on this again.

LD: Awesome. Same, same, same. So we are joined today with James Campbell, who is CTO of Superconductive, which Superconductive is the team behind Great Expectations, which is really a fun and awesome open source platform tool that we care a lot about and think is really fantastic over here at Ascend, so we’re just totally stoked to get to talk to you, James. So welcome to the podcast.

James Campbell: Thank you so much. I’m really excited to be here.

LD: Awesome, awesome. Well, why don’t you… I did a very quick intro, so why don’t you give us a little bit more background, who you are, how you got started with Superconductive and then therefore, Great Expectations and we’ll roll from there.

JC: Yeah, absolutely. So for me, I think there’s been a really fun, consistent trend of what’s been interesting, which goes back to my academic background. I studied math and philosophy in college and I’ve always been really interested in the question of confidence and how we convey to other people what we know and how confident they can be in the judgments that we have. I spent the first part of my career in the Intelligence Community in the United States. I did a variety of different things. I started out working on cybersecurity issues and then moved into more political modeling. And during that time, I got to see and work with, and eventually, manage teams that were working on these really, really important high profile issues, often with extreme uncertainty, incredible resources available to us but we were really interested not only in conveying judgments but in making it easier for people to convey models to help other people not just see what they saw today but also how they saw the world and what they thought would be the dynamics of a system, as well as how it looked right then.

JC: So for me, that interest and long-term time getting to work on those issues also meant a lot of movement back and forth between qualitative and quantitative analytic approaches to understanding the world. I think, like I mentioned, that goes right back to having studied math and philosophy. And for my time in the Intelligence Community, that meant that I got to move between time in data science elements, as well as time in more traditional analytic units. And when I got started on Great Expectations… I think we’ll maybe talk about the details of that later but I realized that this was a great way to seal together, weave together, the analytic approaches that I had gotten to use in qualitative and quantitative groups and bring together very human-centric insights and understanding about the world with insights that we get directly from data sets.

LD: It’s interesting. One of the things that we’ve talked about internally a little bit as I came onboard at Ascend is one of my first jobs on the marketing side of things was actually for a company that was had an analytics solution for government. We did a lot with intelligence agencies. And even before that, I had worked with a company that was doing cybersecurity for an intelligent agency and a lot of the same problems that we saw then, that was six or seven years ago, we’re still seeing today and people are still trying to solve it today. So it’s always really interesting, especially with you coming in through that background from a much different perspective, seeing that play out as well because it is a cycle that they are still trying to fix and industry now is trying to fix that because they’re starting to see the problems that government that had huge data sets was seeing years ago.

JC: Absolutely. I think one of the things that makes cybersecurity really fun is that it lives right at the intersection of policy and people, and technical systems. There’s no technical solution to cyber… Or at least to the most interesting class of cybersecurity problems and not a single technical solution. And it’s been neat because, like you said, there are certainly persistent problems but also, the landscape is dramatically different than it was when I started doing this a long time ago. There have been tremendous advances in both the ability that defenders have to understand what’s going on and of course, the sophistication and breadth of activity of attackers.

LD: So with that, let’s talk a little bit about how Great Expectations itself came to be. I have heard little bits and pieces of this in the industry but I feel like there’s a super interesting story there that maybe not everybody knows about.

JC: Yeah, I sometimes say that it came on a single phone call. Abe Gong, who is the current CEO at Superconductive and original co-author of Great Expectations, and I… Well, if you wanna go way back, the funny story is our families knew each other when both of our parents were respectively in grad school. And so, we had been acquainted over time, but didn’t know each other particularly well. And then early in our careers, we… Together with a few other people, started a, what we called a “discussion group”. Every week we would all jump on a call and just talk about what we were seeing in our careers. And we were in pretty different places. Abe studied Political Science at Michigan for his PhD. Very quantitative approach in programs, but then became very interested in healthcare data and spent his time working in healthcare data start-ups. And of course I was in the intelligence community. We would just chat about everything from models that we found interesting, to hiring and team building. On one of these calls, we would sort of pitch each other that we were going to present on something. On one of these calls, Abe and I both said, “You know what, I wanna talk about something that I’m working on.” We both, not by that name, but we realized part way through the call, that we were literally pitching each other on the same thing. And it was Great Expectations. So we said,” You know what, let’s do this, let’s build this.”

JC: And at the time, I was actually at a research lab, sponsored by the government where we had a lot of latitude to work with the broader community and spark innovation as our mission. So I wanted to work on that. Abe had just started Superconductive at the time, Superconductive health, where they were focusing on healthcare data and cleaning. So for him as a side project, he focused together with me on building this open source library. So it was sort of a part-time endeavor for both of us, a little bit of a labor of love. And what we observed is the community is so powerful and magical, when you engage, that people really caught on. And I think just like the two of us pitched each other, I think there are probably lots and lots of people out there who had very similar ideas, and were able to take advantage of what we did in execution and now contribute to building and making an overall much better community and product.

SK: That’s super interesting, especially that how we see so many problems today get solved, somebody really feeling that pain directly, and try and solve it for their own experience and maybe a couple of folks. And it growing so rapidly, and the community aspects are a really interesting one too. Tell us, what was that tipping point? Or that sort of Aha moment where the two of you and probably I assume the broader team were like,” Oh my gosh, there was something that… This is actually a thing. We should just quit everything else we’re doing now and do this.” What was that experience that you saw or heard from a user of the platform, or something that was that tipping point for you?

JC: Yeah, that’s a great question. I think, I need to almost point at more than one thing at the same time. Firstly, just to give… I’ll awkwardly give him credit since he’s not here. Abe is really good at having that kind of vision of what a product can become, which is what makes it so fun to work with him. And we also are really, really lucky we have… Our team is just a great group of people. And so Kyle on our team is our growth lead, and he was going out and really engaging with people and bringing back incredible feedback about what people were doing, and then really started investing in building out a Slack community. At the time when we started, Slack was just… I was working in an environment in the IC where frankly I just didn’t have access to email very often. So I just wanted a way to conveniently go downstairs, go to a different computer where I could touch the Internet and get on and chat with someone. So Slack was convenient ’cause it was an asynchronous chat format where Abe and I can send some messages back and forth.

JC: And now there are thousands of members in the community, largely spurred on by the work of Kyle, so I think what I would point to there is just the adoption. It’s not so much a single thing as the fact that people clearly saw value and kept going and going and going back to the well, and to the ability to see what they could build with Great Expectations. If I had to point at a moment where I really felt like I saw so much of the power of open source, it would be the point where a user contributed an adapter for one of the other big Cloud providers. We built the connector for storing Great Expectations, artifacts in S3 right away, because we were mostly on AWS at the time. Somebody wrote in and said, “Well, here, let me just contribute the GCP connector.” And the notion that, sure, somebody’s just gonna pitch in and make the whole product better for everyone, really it’s just again, what’s been super fun for me to see.

LD: So I had a conversation with Charity Majors who is the founder of Honeycomb in a past life. And one of the things she said to me, which is getting to bits about with you, is basically, she was like, ” I could not sleep, I couldn’t do anything until I got this out in the world.” And I… Obviously, Honeycomb’s a little bit different. Even I see that here with Ascend with the team, here, Sean. But across the team, where everybody’s like, “I so firmly believe in what we’re doing, we have to put this out into the world.” And that it’s always fun with an open source project. And I think this is also where the Slack community comes in, and just to your point, the broader community, where it’s like you guys really wanted to see this out in the world. But a lot of other people looked at it, and went, “Heck yeah, I need to see this out there too, so let me see what I can do to help with that.” And that is… It is really amazing to see and it does feel make you good, and then all of a sudden just being able to continue to grow is phenomenal.

JC: It’s interesting that you say that because, you’re right, that resonates with me very much. At the same time, I have to admit that I still see, when I look… Even now, when I look at Great Expectations, I still see so many things that we haven’t yet gotten out. There are so many capabilities and features that I want to build and have, and use myself, and I often… We used to joke about this in research a lot, and this is a pretty common element in a lot of larger research organizations, that you wanna cycle through and not have people live in research forever because you bring in fresh ideas when you’ve recently experienced problems. And so I still view myself as building for my past self and maybe future self again and my past self is not yet satisfied [chuckle] so we’ll get there but there’s a lot… I have a very needy past self, I think.

LD: But I think that’s great. And Sean can jump in on this as well. I’m sharing a little bit of the secret behind Ascend, and I don’t think it’s that crazy, is that… Sean’s like, “Oh goodness, what are you about to say?” We have… And we’re actually coming up on it, which is why I’m thinking of it, we… For exactly that reason, we have quarterly hackathons, where our team can go, “I want to see this in the wild. I want to see this as a part of the Ascend platform. This is something that I would use and this is something that our customers greatly would get value from,” and so we dedicate the time every quarter to do that. And even down to me and marketing, there’s… I have a laundry list of things that I would like to get done because I think it’s gonna be helpful for people moving forward but I think having that mindset of, “It’s never fully done. There’s always something that we can do,” is so important because you do always wanna keep innovating and there is always a new way that people can use the product or can use a specific feature, or whatever it might be.

SK: I feel like this is one of those ones where we are all so fortunate to get to build products that connect with us and that we have used similar or would want to use in the future, and that’s the great part about feeling that sense of relevance as humans to work on things that you’re so passionate about. As Leslie mentioned, we are very big on hackathons. We actually use these as the primary input to our quarterly product strategy, which is why we always have the hackathon a week or two before we plan for the next quarter, and we do a big one. It’s actually a 48-hour-plus hackathon. We’ll start tomorrow at 11:00, Wednesday at 11:00 AM, and that is the entirety of the rest of the week, is a hackathon, and I think that’s…

JC: Yeah, that’s awesome.

SK: We do this because it’s a chance to exercise tremendous creative freedom and really think outside of the box and assure ourselves that if and when we find these really incredible things that our former selves deeply want, that they will actually become part of the formal execution strategy going forward.

JC: Yeah, I love thinking about it in terms of almost like tinkering and less because of some sense that the way is completely unknown and more because, in a very complex system, you can’t understand all of the things that will change on the basis of some change that you make, and so it’s useful to actually experiment and move quickly, and have that very entrepreneurial approach to building.

SK: Yeah. So let’s focus back a little bit to… One thing I wanna make sure is that, particularly for our listeners, all sorts of really incredible things that we wanna go do, we collectively… We’re all part of the same team now. I just volunteered myself onto team Great Expectations.

JC: Yay! [chuckle]

SK: That we all want to be just part of Great Expectations. Talk to us more about this. How are people using GE today? And I know there’s this whole glorious future vision for the product but let’s be honest, Great Expectations is also pretty darn awesome today, which is why you guys are doing such incredible work and why you’re getting so much traction, so how are people using y’all today? And we’ll go a little bit from there.

JC: I love that question. Actually, just yesterday, we had an intern give a presentation about some clustering models that he had built on kinds of Expectation suites. One of the neat things of which is that it’s very clear that there are different kinds of high-level user profiles of how people use GE, which is not super surprising, but let me talk about a few different ways that I think people use GE. One of them is the almost cleaning model or the just getting grips with your data. Not just ’cause it’s not a small thing. The way of getting grips with your data and making sure that you have an effective contract with other teams, and a lot of the time, people will use Great Expectations, in that sense, for things like nullity checking and checking values that they belong to a specific set. And a lot of that, again, is about making sure that as an organization, as a bigger group of people, you can detect changes and diagnose issues more quickly, get insights about what you have available to you more quickly. So I see a lot of people use that for basically, I would call it a rowwise sort of a use case. We also see people who are using Great Expectations in a more distributional sense, looking at batches of data and validating machine learning model inputs and outputs.

SK: This would be things like I expect that no more than X% of these values should be null or no more than Y% above yesterday or something like that?

JC: Yeah, and even sometimes… Diving into a richer example might be something like, “I expect that only 5% of my customers spend $100 a month on dog food.” And I pick that example because we had a great session with tails.com, which is a… They sell exotic dog food and they gave a talk recently about one of the ways they use Great Expectations, which included validating models of annual spend and flagging an issue that if you build your model right before a holiday, you would substantially skew the expected purchase for pets. And so that ability to encode the knowledge that it seems so obvious to a person the second you say it, and yet as data professionals, we’ve probably all been in the place where you’re down in the weeds looking at sort of the spreadsheet equivalent, and may not realize those kinds of underlying trends or seasonality or those important ways to tease apart the data. Yeah, so that’s kind of a second area. A third area, just kinda to throw it out quickly that I think a lot of people use Great Expectations to handle is what I would call schema level validation, and it’s amazing how important that is. I think for, in a really robust deployment of Great Expectations, almost certainly you’ll have a mix of all of those kinds of expectations in there, but there are really good starting points along the way for each of those kinds of approaches.

SK: Gosh, like this is… Something you said but I think it’s so powerful that I wanna double and triple and quadruple down on, because I wanna make sure the listeners here kinda really process this, and to paraphrase what you’re saying is, Great Expectations helps us to validate the assumptions that we put into our data and into our code base, and I think the thing why that is so incredibly important is we oftentimes don’t realize how many assumptions we truly have baked into our systems, and into our code that are not explicit, that are just, I just happened to assume that this model was correct because I built it right before Christmas, or I just happened to assume that this was the nature of this field inside my data set because I looked at a small slice of it from last week.

SK: Right? And we see the same thing on the Ascend side, which is, people just assume that when we’re orchestrating data pipelines that this data will always come in at that time or after this other piece of data, or it will never have trailing arrivals of data from two days late, and so it’s these kinds of assumptions, that can oftentimes end up creating this tremendous chaos downstream, and even hitting me, what you touched on really early on is it’s those assumptions when not validated or automated is what erode our confidence…

JC: Yeah.

SK: In data.

JC: Absolutely. And I think to your point about when not validated, the thing that makes this important is, it’s not like an assumption can be just validated and then you know it’s true and you move on, because the assumption needs to in general be re-validated [chuckle], especially when it’s… Again, to the conversation about dynamic systems and especially complex dynamic systems, the interaction effects and kind of the way that non-linearities accumulate means that you need to continuously verify that what you have sort of outlined as your requirements for your model or whatever is true. One thing that this really… We’re getting into kinda maybe a bit abstract, but I think one of the things that people see when they encounter Great Expectations is often there are a couple of reactions, like one is like, “Oh, I’ve built this before,” which I love because it’s true, like, yes. You probably have built this before.

JC: Now, you probably didn’t build it as completely and handle all the edge cases and so forth, which is why people love it, ’cause they come in and they’re like, “Oh yeah, yeah, I already knew that I needed this.” But then I think the other thing that people get to sort of next is realizing, actually, there’s things that I used to do outside of the realm of automation that I can bring inside the realm of automation with a tool like Great Expectations. And so, if you say, “I’m gonna run a very simple model of arrival times for data,” to use your example, and maybe 99.9% of my data arrives within a minute of the time stamp, and then there’s just this tiny tail, some of which drags out for a couple of days.

JC: Making a model of that is not hard, right? No data professional is gonna think, “Oh, well, that’s gonna break me. I’ve never figured out how to do that.” On the other hand, continuously running it, checking it every time, building the plot and just putting your eyeballs on it every now and then, that’s hard, right? [chuckle] It’s hard because of the mental energy of doing it, and so taking that away, lets you focus on what the data means or the insights that you’re trying to draw, which I think people find… I certainly find really appealing and is really a valuable way to build processes.

SK: Yeah, I totally agree. It’s funny, oftentimes we think about this, in our world, as an automation system we think about autonomous data pipelines, oftentimes joking on the… It has to constantly be validating and verifying dependencies and constantly working, it’s not something that you just even schedule to work, very similar to how, if you had your self-driving car, you don’t want it to assume that the lane lines just happened to be this way because I saw them that way a month ago, you still want your autonomous car to continue to look for lane lines and make sure it’s staying inside of those, regardless of [chuckle] where they may be now.

JC: Yep. We have one of the most long-standing jokes or kinds of ribbing that Abe and I do back and forth with each other, is that he likes to kind of in the very… The zeitgeist of contemporary data engineering say, “Everything is a Dag.” And I always respond like, “No, if it doesn’t have feedback, it doesn’t count.” And of course, you know both of these are true at the same time, and… But the point I think is, what we’re doing with Great Expectations is introducing the possibility of a feedback loop that makes the system actually be able to be dynamic and survive and adapt and to your point, become autonomous, eventually. And if you don’t have any feedback, you know you’re the old kind of robot, the if else robot and that doesn’t survive in the real world.

SK: Highly conditional, we’ve heard it. Yes. So one of the things that this gets super relevant or related to is, we see a lot of folks starting to talk about it in different circles, more or less, is this notion of DataOps, and a lot of people have a lot of different definitions for it. I know Gartner now has it on the hype cycle somewhere, and I think we’re probably searching pretty high up that first slope, but it’s an area that people are investing a lot in right? And there’s a lot of similarities we think of when we think about the data ecosystem and DataOps and the DevOps ecosystem, and what we do for software and what we do for data. How do you all see those, what’s similar, what’s different? How do you see yourself playing in this DataOps ecosystem?

JC: Yeah. I have to admit, I really like the term. And to be fair, I’m not sure I really ever learned what… I think most of my early career was before DevOps was a word, and so I’m not 100% sure that I can claim to even understand that, and so probably I’m by default, ruled out from knowing what DataOps properly can mean, and I’m sure there are a lot of definitions, but I’ll tell you why I like it is one is, to me, there’s a huge amount of power in doing, and bringing the kind of operational aspects of any process into play, I think is approximately a proxy for the feedback process that we were talking about earlier. Because it’s about making sure that a system remains functioning and relevant, even as it’s part of something else that is changing. For Great Expectations, I see us like Ascend right in that kind of squarely in the middle of that space. And I think that that reflects a number of different things. One is just the importance of being able to have insight into the process and to be able to see what it is that you’re operating on and changing.

JC: And then also because I think part of the idea of DataOps is to professionalize, what I would maybe call something like Agile change management around data systems and having a tool like Great Expectations and others that are similar in terms of being able to provide insight both into the status of a running system, but also into what might happen if you were to change data is really, really, or a pipeline is really, really important. So specifically, I think one of the use cases that I find really fascinating for Great Expectations is where you use the tool to identify in your real data candidate new test data that you could use to evaluate, again evaluate potential changes to a system.

SK: So this is a super interesting one that I’d love to unpack further and the… ‘Cause one of the things you said, I think really hits home here, which is, when we think about the the classic DevOps role, often times I kinda paraphrase that as… Well, despite all of the other fancy definitions, DevOps is around how do we enable more people to write more software faster and safely, right? How do we scale this thing and go faster and one of the words you used was iterate, right? How do we break away from this massive waterfall style of building and into this more agile iterative model? And I think we see this in the DataOps world notion today, which is how do we enable more people to build more things faster and safely with data?

SK: And to do that, you need the ecosystem around it to have confidence, that your data is actually safe and valid and working, and that your systems are safe and valid and working and so, very philosophically and I’m going to use these words very carefully ’cause I know I’m totally outgunned when we get into the philosophy domain, but philosophically, we are trying to do the same thing just in data, and I think, we think of the tremendous impact DevOps has had in the software domain. I think the one really interesting question I wanna dive in with you on was, what is the benefit with the value, ’cause DevOps has driven tremendous value for teams, for organizations, and for companies. What’s the value, or put it another way, what is the cost for a lack of confidence or a lack of ability to actually iterate and to be trapped in this waterfall world of data, what does that cost? And what is that benefit that you all see with your users and customers?

JC: Well firstly, Wow. I love the way you characterize that process, I think that is spot on, and the notion that what the system is about is making it easier for people to contribute safely is awesome. And I think that suggests metrics around things like… I remember we used to sometimes joke the amount of time that it takes to spin up a server, and there was a time where that was actually measured in years, because you had to get acquisition and wait in line for space in the server room. And now, sometimes I forget that I left one up and it’s like, “Oh yeah, we spin that back down,” [laughter] it’s like, it’s not even an issue.

JC: That said, I feel like… To your question of what is the cost of not doing this? I think this actually goes really well to the motivating story for me in Great Expectations was, if I had to pin it down to one very specific case, it’s one… I won’t dive into any of the specific details, but basically, we had a model that used a particular type of data and the data was widely available, this kind of data, so we’ll pretend for a second, it’s weather data. And so we were looking at this model and maybe it did something like tell you what the temperature was likely to be at some point in the future. Well, the thing was, the model was built around data that was, let’s say, hourly, but the structure of data that’s reported, say every minute, is fundamentally the same. So if you have a user who doesn’t know exactly what… And obviously, this wasn’t exactly the case, maybe that one’s gonna seem too easy, but when you have a user who doesn’t know the details of what the data is and what it can mean and what it does mean, you really risk giving what I call nonsensical answers. They look right, they make sense, it says 78 degrees.

JC: Well, that’s totally plausible, and it might be just completely wrong, right? Because to your point earlier, you didn’t check the assumptions of the systems, one of the assumptions was the periodicity of the input data or the quality of the input data or… And this actually happened to me not that long ago, pounds versus kilograms in a scale, and somehow in my head I magically realize… I was like, “Oh well, you just subtract a 100.” And for this one data point where I had first looked, it made sense, but no, of course, it’s just a couple… So I think that’s the real risk. I do wanna flip it back to just for a second and say, what’s the opportunity, because the thing that I think is really amazing about, to pick up a particular tool, spreadsheets, is just how well they work because… They last well, ’cause they work well. They empower domain experts to get insights really, really, really quickly, and that’s amazing, and we’re not gonna beat that for just getting started or just building some new quick model, or in my case when I’m… Basically, it’s a calculator.

JC: The thing that we can unlock, I think, is allowing more and more people to get raw data or data from some source into the format that they’re comfortable and familiar with working with it. And so part of that, of course, is the pipeline building, but then a part of that is also the like, “Does this approximately make sense? Did I do this right?” And the ability to have what I’ll call out-of-band information, like the human… The gut sense of the classic example from early AI, is this bigger than a breadbox or whatever, people are really good at knowing stuff that’s bigger than a breadbox, this is a surprisingly hard question. But if you can encode more and more of that information into your process, then you can say, “Hey, this thing that you said should be about as big as a breadbox it’s approximately the size of a house. You might not be looking at what you think you’re looking at.”

LD: So it’s interesting, and let’s… To kind of twist on that just a little bit to kind of another area, we should… There are ways people are going to use Great Expectations. And I know with an open social, it’s kind of difficult, so you don’t always know how people are using it, but what are some of the… Clearly, you have ways that people are using it, but what are some of the really fun use cases that you’ve seen? Or where is it where you see somebody and they just had gone completely off the rails and now it’s kind of they’re back in what they need to do? Or what are some of the really awesome ways that you have seen the community come in and use Great Expectations?

JC: Yeah, I love that question, and I will say that I do wish that we had more insight into all the ways that people use Great Expectations, I still get… I have a blast hearing about them sometimes where I’ll find out so and so company is using it, and I had no idea. One of the things that we had a conversation about recently as we’re… So we didn’t talk about this, but we’re planning to build a SaaS Product around Great Expectations that makes it much easier to use and to collaborate with other people who are using Great Expectations, have conversations and insights into all the metrics that are generated. So we’ve been having conversations with companies that are using the current open-source version to understand what their deployment patterns look like and where we can help make that easier, and one of the things that I just find amazing is the breadth of kinds of companies.

JC: So actually one of the very, very first deployments… And of course, it was almost a different product in some ways back then, but is Calm, which is a meditation app, which if you haven’t used it, it’s awesome. We use their sleep stories in my house and my daughter uses it every night. Well, they were some of the earliest Great Expectations users, and it was really interesting to me that the way that they got started on populating Expectations was this kind of iterative back and forth exchange with Internal Analysts on their team. And what I think that has led us to realize is that in many ways, setting up Expectations is exactly like building a question and answer system, it’s a dialogue with data, so who asks questions of the data? Absolutely, everybody. We had one of the other talks that was recently given in our community forum was from Maersk, the giant shipping container company, and again, I don’t wanna go into marketing mode or anything, but the breadth of applications is just phenomenal, and I think more than any single one, that’s what I would say, is just how universally people see the problem and the value in addressing it.

LD: That makes such sense. Well, to wrap it up, I always like asking this kind of final question, ’cause I think, to your point, the breadths parts across the board are fantastic and I always enjoy hearing that, but what are you most looking forward to, whether it is within what you guys are doing with Great Expectations or whether it is just in the data industry overall, over the next… And I hesitate to say 2-5 years, ’cause I feel like five years in the data industry is just like… You have no idea what’s gonna happen. But what are you kind of most looking forward to in the future with data, whether it’s a use case or a particular technology or whatever that might be?

JC: Yeah. Well, first, I’m gonna go ahead and go out on a limb, and say, five years from now, we will still be using spreadsheets.

LD: I would agree you with that.

SK: I think that’s a safe bet.

LD: I think I would agree with that. Sean would probably agree with that, everybody loves a good spreadsheet.

JC: I think, for me what I’m most excited about is consistently, the way that we close the loop between machine understandable and human-understandable systems and statements, and I think Great Expectations is a lot about bridging that gap. What I think we’re gonna see, and where I’m really excited, is when we have better tools for immediately transforming our discussions into insights, we’ll… The phrase we use a lot on the Great Expectations team is, “We’ll leave less on the cutting room floor.” So right now, I think one of the big insights that leads people to engage with Great Expectations, is that they already know a lot of the things that they’re putting in Expectations, that information just gets lost when they build their pipeline, because they did exploratory analysis, they went through and they tested assumptions.

JC: And they built the system that they built because of what they learned, but then that information goes away, and then the world evolves and then you end up having a breakage and you need to go back and revalidate it. Whereas, if you instrument it along the way, you get a benefit. Now imagine that you can actually have structured conversations about the observations that you have from your data set, about times when something breaks or changes in an unexpected way, and those conversations can immediately get folded into updates to your existing expectations, new expectations. So I think it’s really about making the amount of friction required to move between all the different places that we work, less, and really harnessing the human understanding, the deeper semantic understanding of data, that I think we’re gonna see just a tremendous explosion of potential.

LD: Awesome. That is not something we’ve heard, we’ve dug deeper and deeper, and no, it’s fantastic. It’s a great way of thinking about it. James, thank you so much. This has been a ton of fun, super interesting, and I think it’s a really great episode coming back from our hiatus. It’s been fantastic.

JC: Awesome. Thank you. This is a blast. This is fun stuff.

LD: Don’t tell us that too much or else we’ll have you back. You’ll just be like…

LD: We’ll keep having you.

SK: I feel like we have so many other things to go pull threads on. I love this notion of having to dialogue with your data. We just started to unpack these parts of validating and automating assumptions to build confidence, which is really good. I think it’s gonna be that backbone of this DataOps movement and era. So much more to unpack, so…

JC: Very much agreed.

SK: Thank you James.

JC: Yeah, my pleasure.

LD: Well, I couldn’t have asked for a more fun first episode back. So thanks to James for joining us today to chat Great Expectations. You can find out more about that at greatexpectations.io. And as always, we wanna hear from you, reach out on Twitter @ascend_io or on LinkedIn with guests and topic ideas. We always love hearing what you guys wanna hear about next. Welcome to a new era of data engineering.

Ep 8 – Expecting Great Things from Data with James Campbell

About this Episode

Transcript