Build. Engage. Learn.

Start BuildingTalk to an Expert

Podcast: The Evolution of Data Teams with special guest Jesse Anderson

by | Jan 6, 2021 | DataAware Podcast

In the latest episode of the DataAware podcast, Michael Leppitsch and I had the opportunity to sit down with Jesse Anderson, managing director of the Big Data Institute and author of the new book Data Teams, and talk about the inner workings of data engineering, analyst, and scientists teams.

Learn more about best practices for high functioning data orgs, pitfalls to avoid, and how these teams are continuing to evolve in the latest episode of DataAware.

 

Episode 02 Transcript: The Evolution of Data Teams with special guest Jesse Anderson

Leslie Denson: I think we all can agree that data teams are growing more and more crucial to business success every day, so much so that our next guest has written the book on it. Today we’re chatting with Jesse Anderson, Managing Director of the Big Data Institute and author of the new book, Data Teams, about what makes for a successful and maybe a not so successful data team in this episode of Data Aware.

LD: Hey everybody, welcome to another episode of the Data Aware podcast. We had a great response to the first episode that has gone out, so we’re really excited to start bringing you guests, which we have a really great guest today that I’ll get into, and more topic and conversation around the data engineering space. So thanks for joining us for our second episode. Today, I’m joined with Michael Leppitsch, who works with me here at Ascend on our partner side, and he and I are gonna be chatting with somebody that I always enjoy talking to, who has some really great insight in the space, Jesse Anderson, who is the Managing Director at the Big Data Institute, and also the author of the newish book at this point, it’s been out for a little bit, but new book called Data Teams. Welcome to you both. Hey, Jesse, how’s it going?

Jesse Anderson: It’s going good. Thank you for having me.

Michael Leppitsch: Hey, Jesse, good to see you again. Looking forward to the talk.

JA: Thank you.

LD: Jesse, why don’t you tell us a little bit for the listeners who may not know exactly who you are, why don’t you give us a little bit maybe more background than I gave you, and kind of talk about the book that you have out? And then from there, that’ll set the stage nicely for rolling into some of the conversation.

JA: I was early on an employee at Cloudera and I started to see a pattern, and that pattern was technology wasn’t the only issue that teams are experiencing, and people would say “Hey, this Hadoop isn’t working for me,” but when I’d look at the team, that would be, “Well, you actually have the wrong people and you set the people up for failure,” there’s nothing… There were problems with Hadoop, definitely. But you weren’t hitting those problems yet, and so I started to piece together and started to create that body of work, so I’ve been on this mission to create this body of work probably for the past eight years now of saying, “Hey, technology isn’t the only issue, and the Data Teams book that you just mentioned, you can go to datateams.io to learn more about it, but Data Teams is the latest culmination of that body of work of of me really crystallizing, trying to say, “Hey, hey everybody, it’s not just technology, you need to actually make organizational changes. And here it is.”

LD: One of the things that you talk about in the book, which I know there’ve been a couple of… To your point, it’s the culmination of a lot of the work and the research that you’ve done and the teams that you’ve worked with, is, you kind of bucket them into engineering… Data engineering, data science, and then operations, so just from a super high level, let’s get everybody on the same playing field as we… ‘Cause the conversation will sort of roll around those. How are you bucketing folks out? It’s probably fairly obvious for many people, but for some it’s not, so what are your thoughts on how you’re grouping people into the different portions of the team?

JA: So I’ll start with data engineering, which is most often missing, and data engineering is a software engineer who has specialized their skills in big data, and this is important, this isn’t the DBA, this is an actual software engineer, this is really important for teams. And so that data engineering team is mostly made up of data engineers and may be cross-functional, but is mostly made up of data engineers. Then you have operations, this is another one that you’ll see that’s missing, my definition as an operations person who has specialized in learning these big data frameworks and knows how to operate them, knows what data issues are, and that operations team is mostly made up of those operations engineers, however you wanna call them, and then we have our data science team.

JA: And I just define a data scientist as a person with a mathematical background who’s learned just enough coding to get by, who applies that math to data and creates models. And that data science team is generally mostly made up of data scientists, though there may be some more cross functionality to it. One thing I always try to point out is, okay, which one’s the most important? Usually management thinks that “I just need that data science,” and really the focus of the book is saying, “Hey, you need all three. If you’re going to be successful, you need all three.” And to that end, I published some research on that, of people actually telling me how successful they were, and then correlating that with how many teams they have.

LD: That makes sense. And Michael, you probably have a little bit more experience with this, at least at Ascend than I do, and I’ve seen it in past lives as well, where they may try and even mash all of those teams into one. So having a data engineer sit on the data science team and have it all be sort of under one roof, it works for some companies, works for others, I don’t know, Michael, how are you seeing that as you’re chatting with folks too?

ML: Yeah, it’s interesting, I did some research this summer and that actually shows that a lot of people, especially the data scientists are being asked to multi-task, they’re being drawn into tasks to fill in these gaps that Jesse’s talking about, whether it’s operational tasks or data engineering tasks. And we see that all the time. Jesse, is that something that bears out in your research as well?

JA: It does, and I’ll say it since you may not be able to say it. With all due respect to data scientists, they’re terrible at doing data engineering. And on the high-end, they’re okay, so you have an okay to terrible. So these data scientists without data engineers are creating mountains of technical debt and it’s a bad scene, so you gotta be careful.

LD: It’s also tough, because with either the lack of data engineering teams or the lack of people on the data engineering teams, they’re so overtasked, they’re so overburdened. That’s probably… And it’s no surprise, this happened throughout the history of data, depending on what you’re talking about, whether it was a DBA or moving down into data engineering or whatever it might be, if you’re talking about a developer team for whatever, it always kind of happens, that the backend team just gets super overtasked, and the data science or the analysts are going, “I need this access and I need it now.”

LD: So I feel like it’s been an interesting ride recently for the data engineering teams because they’ve become more and more critically important to the company, or people are realizing how important they are. What have you been seeing around that in the last couple of years, as to, are the teams growing, are they shrinking? What’s happening on the engineering side of it?

JA: It’s growing, people are finally getting their heads out of, “I just need a data scientist,” and they’re finally understanding, “Oh, I actually needed this other team.” So the advice I’d give to individual contributors who are listening to this, hey, if this data thing sounds interesting, if this big data thing sounds interesting, go for it, you can make more money, there is a high demand and low supply. So if you’re an individual on that side, you wanna be on that low supply because you’re gonna get paid more, and sorry managers, as you go to hire these people, they’re going to be in short supply. I work with my clients, I’m trying to find and hire these people, they are in short supply, and we’re not just talking about in the US, I work with companies around the world on this, it’s short supply around the world. So if you wanna be on that and you have those skills, by all means, do it. You’ll be so much better off.

ML: So on that note, then, Jesse, if these enterprises are starting to tease out data engineering as its own skillset and its own set of practices to enrich their data science and their decision science teams, what are some of the things that these managers should be looking for, is there kind of a short list of criteria or qualifications that you’d look for?

JA: So there’s a couple of ways a manager can do it, and I’ve done this first hand with clients. What we do is we initially look internally. If the company is big enough, usually there’s people internally that will either raise their hand or have an interest in this already… If you give them that chance. And so one client we did that, they had a decent size team, we took the people who raised their hand, who volunteered, and we talked, and I talked to them and made sure, hey, they were the right sort of person.

JA: So the things you wanna look for… And these are things I talk about in the book as well. A lot of data, an actual interest in data, this is probably the biggest difference between a software engineer… Even a backend software engineer and a data engineer is, we’re not just creating backend code, we’re actually creating data products, and that shift in mindset is actually pretty important. A data engineer creates data products, they do not release code, they do not stand up infrastructure, and so those data engineers that realize that, and actually think “Hey, that’d be interesting.” Those are the people you want on the team.

LD: So what are some of the biggest missteps that you’re seeing, whether it be companies make or even down to the data engineering teams hindering any kind of success? Again, I’ve seen it in past lives as well, where you may have a data engineering team and they may have X number of resources and things may be going great, but it just continues to fall down and things continue to not be successful for any number of reasons, but what are some of the… I hate to call them worst practices, but we’ll call them missteps or pitfalls that you just kinda always have to shake your head when you see come through?

JA: There’s quite a few, and in fact, one of the parts of the book, the Data Teams book that I’m most proud of is there’s this chapter, that’s basically about troubleshooting and debugging your teams. “Here’s a problem, here’s what the possible solutions are, or possible sources of the problem are.” So in a very basic sense, the management team set the data teams up for failure in some fashion, and that could be they just hired data scientists and now there’s no data engineering or operations. Or you could have an issue where you hired the wrong people, you have data engineers, but they’re in name only as it were. So that you took a software engineer and you said, “I dub you, sir, data engineer,” or they took their DBAs and said, “I dub you a data engineer” instead of understanding, “Hey, without being able to program, without that software engineering background, these DBAs, these data warehouse engineers are not the right people.”

JA: So at its very core, at the very start, it is all about management is the ones who initially set teams up for success or failure. I have entire talks about this and I show charts about this. It is not Spark didn’t work, it is not Hadoop didn’t work, it is you set the team… You as the management team, set the team up for failure by doing X, Y, Z. Other things that I’ve seen them try to do is to try to pay undermarket where there’s somebody in HR that looks up the pay bands for data engineers, and there’s two types of data engineers, there’s a SQL-focused data engineer, and that is a generally accepted title and so they’ll see that one, they will say, “Oh, this person only makes… ” Let’s say 50,000, 60,000 US. When the data engineer, the data engineer I’m talking about is making significantly more, they’re making 100,000 plus at a minimum.

JA: So what will often happen is the HR person will come back and say, “We can only pay this person 50,000, when this is a sub-sub-sub-specialty of software engineering, and they just say, “No, I’m not going to do that, I’m not.” And what will happen, especially at the enterprise companies, is that they will continually have to do one-offs, they’ll continually have to do that one-off to get a one-off HR signing off and saying, “Yes, we’ll pay that person.” And what happens there is that they will consistently lose out on good people because they’re getting offers from other places, getting good offers, and there they’re looking at your offer saying, “This does not make sense, you don’t understand me, I don’t even wanna work with you,” and that becomes a really big issue.

JA: So I have an entire section in there just for the HR folks saying, “Hey, guess what, here’s what you should know.” And the reason I wrote that section is because I saw that so often, and I wanted it in a book, so somebody could say, “Here, HR person, it’s actually in a book. You believe books, right? There you go.”

LD: So I also think about the difference between somebody who’s maybe really good at standing up something around a Hadoop cluster or a Spark cluster versus the tool set that an engineer or data engineer needs in order to do something like stand up flank jobs, because it is… While it can be the same skillset, there are nuances towards when you start moving into the streaming realm, and even nuances when you’re in maybe Spark that you just… If you haven’t been doing it, you just don’t know, that make it a little bit more difficult where it’s finding the right… To your point, the right engineer with the right skillset, do they know Java, what are they using, and figuring out how they best fit in?

JA: Yeah, it’s interesting that you point that out. I gave a talk specifically about how does data engineering… How do data teams start doing real-time? I wrote that just as a talk because I kept on seeing teams who are, let’s say, doing batch and getting into real-time or going straight from small data into real-time big data, Flink and such. And they weren’t being successful and it was because they weren’t really internalizing that shift, they weren’t internalizing just how difficult it is to get from that point A to point B, and along the way, it takes actually some honesty from the manager… This is what I often see is the manager may not be taking that super honest look at the team and saying… Calling a spade a spade as it were.

JA: Hey, if your team was barely squeaking by with small data, guess what? They’re not gonna be able to do big data and they’re definitely not going to be able to do that real-time. Or there’s other manifestations of this just operationally. Let’s say they’re operationally… There’s no operational excellence. Well, as you go from batch to real-time, hey, you can get by with sloppy operations, it’s not ideal, but you can have a day of downtime, you can have two days of downtime. If you have two days of downtime with a real-time system… Oh my goodness, what… What are you doing? You have absolutely lost out, even having an hour of downtime.

JA: So if you don’t have that operational excellence, that SLA in place, it manifests all over the world, all over the place. Continuing that on with data science, if your data scientists, their model code is so poor, so not production-worthy, and you go from batch to real-time… And batch, you could say, okay, if the data scientist model breaks, we can just restart it as long as we’re ready. And now we get to real-time, and if it’s consistently breaking, hey… We can’t even use that.

JA: So what I often do when I talk to teams, I say, I’m not gonna talk to the technical team first, I’m actually gonna talk to the business first, and the reason I talk to the business side first is they’re gonna tell me, “Hey, I’d love to use the data team’s model, but it’s so crappy. It never works, we can never rely on it, we can never rely on the data coming out of it, we can never rely on the reports.” And I look for those questions, I look for the responses from the business, because I know they’re going to tell me what’s going on better than the data teams.

JA: And so once I figured that out, then I start looking at what was behind the scenes that caused that problem. So those of you who are listening, I’d encourage you to do the same. I’d encourage you to talk to your business, although we focused on the three teams here, I would encourage you to really get close to your business user and really talk to them because they’re going to tell you these sorts of things, you just have to ask that question.

ML: I was looking at those three dimensions, these three facets you talk about, engineering, science and operations, and mapping that back to some of the experiences we’re having with our customers. Across the board, it seems like every customer comes at the problem with one of those three biases, like an engineering biased organization will just think everything’s solvable with code, just we’ll run code to do that, and then data science team feels like, “We’ll just use a platform and do some notebooks and we can do everything with notebooks or whatever.” Whatever their bias is. And just recently, an operations team or bias team said, “We’ll just choose the right platform, we’ll pop up another database and we’ll just run these 10 different kinds of databases to do 10 different things, and that’ll solve the problem.” How do you guide people past those biases to open their eyes to the other facets and how critical they are when they’re coming at the problem with this pretty powerful bias?

JA: So what I generally do is I get them to understand the problem that they’re experiencing and why they’re experiencing it. So if you talk to a group of data scientists who don’t have data engineering, you will say, “Hey, what is happening with your models?” And they’ll say, “Oh, the model isn’t doing this and it isn’t doing that, we can’t get the data fast enough.” They don’t say, “Hey, we’ve created a mountain of technical debt,” they just say, “There’s problems with the model, there’s problems with the data.” So what I try to do is I try to help them understand, well, the problem with your model is actually not with the model, it’s because you’re missing that team.

JA: Or for data engineering, it’s… My experience is that usually the data engineers and the data scientists aren’t seeing eye to eye. Where the data engineers see the data scientist as playing fast and loose with every single engineering concept that we should be doing. Like “Hey, these notebooks are not the way we should be productionizing code.” And then on the other side, you have the data scientists saying, “These data engineers, they want everything to be… Every t crossed and every I dotted, and they want this and they want that. So there’s usually this mismatch, each side thinking that the other side is doing something wrong or being overly cautious or what have you. So when I mentor a team, I’m always focusing on them on, “You are creating a symbiotic relationship, you are not having an adversarial relationship.”

JA: So I’m always trying to get them to understand, “Data engineers are the creators of the data products and the primary consumer, your primary customer of those data products are the data scientists, so if you create something that they can’t use, then you are not creating that value that you should be.” And then likewise, the data scientists are creating kind of that cherry on top there, they’re creating those derivative data products.

JA: And then operations. Usually you get one of two things, a complete lack of realization of the importance of operations, where Operations is somebody else’s job, but when they get that call at 2AM, then they realize, “I guess it was my job.” And so there’s this lack of thinking long-term about “Just how should we deal with these problems? What should we do?” So it’s really trying to help people empathize with each other of, “Yes, we’re all trying to create this, but we’re a triangle, we’re not this line or we’re not this adversarial.”

LD: Feels like a lot of teams are just trying to keep up, and it’s a more of a “ready, fire, aim” scenario than actually… Nobody wants to take the time to actually truly sit down and plan, and nor do I advocate for lengthy planning cycles for anything, but I feel like especially when it comes to your data and how you’re using it, because this is such a “duh” statement, but data is so critical to everything that you’re doing within the business, “ready, fire, aim” is not the correct order in most cases, so…

JA: It definitely is. And usually those sorts of scenarios came from a hair on fire, that’s kind of what I kind of term it as. It’s a company that’s always running to that next thing, running to that next silver bullet. And yeah, those sorts of companies get very little, if any, value out of what they’ve done. And usually what they’ve done is they’ve short cut everything. And it’s not just the shortcuts on writing that code, they took a shortcut on their architecture, but usually more crucially they took a shortcut on the people side.

JA: They took their software engineers and didn’t train them on these new things, or they took a bunch of data warehouse people and didn’t even give them the resources. And so now you have people who are not remotely set up for success trying to do this, and they’re on this death march, quite frankly. And I see it quite often. So in those situations, it’s completely unfair of management to do this. There’s a post that will be out by the time we… By the time this podcast airs, and it’s some of my brand new way of visualizing, just how difficult that it is, or what are the skills needed for data engineers? And what I did is… You’ve probably played Civilization before, you have?

LD: Mm-hmm.

JA: Okay, so you know about a tech tree. And what I did is I kind of visualized these skills out as a tech tree, and I said “Here, if you wanna be a data engineer, it’s going to be at the very end of that tree, becomes a data engineer, but there’s all these other skills that come into it.” And so that in and of itself is interesting. But what I did is I visualized out, “Okay, here’s what a data engineer looks like, here’s what a DBA looks like, here’s what a software engineer looks like”, and you can actually see the reds and greens in the diagram that shows, “Oh, this DBA is missing a ton of things.” And it’s not just one or two, it’s also a ton of things that are complicated, and that software engineer likewise, they have that software engineering background but they’re missing all of this big data, all of this distributed systems part. And that’s a pretty substantial part.

JA: So I’m really trying to help people understand, “If you just pop a person into this role or into this, you’re setting them up for failure.” And unfortunately I see it too often, and it’s simply because management is thinking, “Hey, this is a lateral move”, rather than actually this is a up and to the right move, in terms of complexity, in terms of learning. And we need to set these people up for success.

ML: It’s interesting, your book is a bit of a survey of all the patterns and techniques and skill sets that are needed for all these roles. Is there a dimension to this where technology actually is constantly re-addressing each of those areas because the evolution of technology is actually increasing as well? So, sort of foundational patterns might be the same but it may not no longer be about writing the stuff yourself, it may just be about recognizing the pattern tool kits and making sure that you have the skills to put the right tool kits together rather than writing everything from scratch. You see where I’m going with that?

JA: Yeah, yeah. Technology has evolved and improved, and one of the things I talk about in the book is, I guess I’ve come to call it Jesse’s law, and that is a general purpose system cannot be automated or no code, you can get… You can carve out either a specific use case or a specific vertical, and those can be no coded, but your general purpose thing that does everything that… Hey, you’re going to have to code that… So when I work with a team, what we really try to focus on is, I don’t want you to write code that you don’t need to. If there is a way for you to move data from point A to point B without writing code, by all means, we should not be doing that, or if SQL can get you by in a case… Let’s say you’re doing a real-time join, SQL is so much better for writing that sort of thing, don’t write that code.

JA: However, if SQL is your only tool in your tool box, that’s a whole problem unto itself, and this is really where it’s important, where I think teams that either think, “I can do everything with no code or I can do everything with SQL,” they get themselves into this problem where they now painted themselves into a corner where the complexity has exceeded their skills and they can’t get any further, and that’s a whole other problem of teams where they’ve stagnated. But yes, whenever I work with a team, we’re always thinking about how do we leverage a tool so we don’t write code. One thing, if you’re, especially to management, listening, I would say, here’s another rule that I use when I work with a team, and that is, is the problem so difficult that there is either an open source project or an entire company around that?

JA: And the reason that you think about it in those terms is the problem that you’re trying to solve is actually bigger than you think it is. Because if there’s an entire company or project that has sprung up around it, somebody has seen this pattern over and over, has decided that this is difficult enough to where I can support a company and that other people are thinking I will give them money to not have to deal with this, you are actually really shooting yourself in the foot by trying to code this up yourself, the American saying for this is, “Penny-wise, pound foolish.” Well, in big data, it’s not pound, it’s a ton. Or if you’re going to be… I’ll put it in metric for our European friends or non-US friends. Basically the entire world, let’s put it in metric, we got a milligram of… You’re gonna save a milligram of money, but you’re going to give yourself many kilograms, thousands of kilograms of problems. It’s really important for people to understand this.

LD: And I think the point you bring up about everything can’t to some degree be no code at all, which is something I think we’re seeing with flex code as kind of the path that we’re seeing a lot of folks take, which is to your point, if SQL can solve it, let SQL solve it. That is fantastic. That’s the easiest common denominator, it means that maybe you aren’t on the hook for a 3:00 AM call if something goes wrong, because somebody else can solve the SQL, but you need to be able to have the ability to dig in whether it’s with PySpark, Scholar, Python, whatever it might be, there are going to be things where you have to dig in a little bit more. So spend your time on that, and then let the stuff that can be SQL or let the stuff that can be no code be no code. And really have the ability within your team to have that flexible pattern.

JA: And this is an interesting point I talk about in the book as well, that not everything needs to be met with the data engineering team. If the data engineering team has exposed the data products with, let’s say, SQL interfaces we’ve now lowered that bar quite a bit, because it’s becoming more common for BI people to know SQL, it’s becoming more common for data analysts to know this, so now, if you’ve done this right, what you should be looking for as management is a gradual burn-down rate of the data engineers having to do everything, this should be a goal.

JA: And what we call that citizen data scientist or whatever you wanna… In the book, I talk about them, but I also talk about some corollaries to data, should your data scientist be doing every single problem? That could be the same sort of issue where if you throw even, let’s say simple reporting at the data scientist, hey, maybe that’s not the greatest use of their time, maybe it would be significantly better. But if you’re thinking, “Hey, we can’t do that.” It probably shows an underlying issue with our data engineering team, we don’t have either the right data engineers or something. So it’s incredibly important that we think about the right person for the job, and what will eventually happen is that those BI people, those data analysts, as they improve their skills, they could actually get to a level that is where they may be able to become a data scientist, that is a specific progression at some of the companies I’ve talked about.

LD: The Holy Grail of actually being able to truly democratize your data to the folks who need to use it.

JA: And that democratization, there’s a contribution that I got from Lars Albertson where he talks about that, and I talk a little bit about data democratization. So what’s key with data democratization, in my opinion is one, that the right infrastructure has to be there, somebody has to be responsible for the curation of that data, because if everybody is responsible, no one’s responsible. And that’s going to be an issue. So what you have, in my opinion, what you have to do is you have to stand up the right infrastructure, you have to expose that the right way with the right technologies. And then you could have those citizen data scientists and they’ll have that… They can access that data, but I still think that there’s still the need for people being specifically responsible for certain parts of that, even on the data science side, you are responsible for X, but what I really see is that citizen data scientist is the person who doesn’t have that specific title of data scientist who has a hypothesis, who actually has a thing that they want to try or learn about, and by having that data there, they can actually start looking at those patterns themselves, and this is what we really want.

LD: We talked a lot about data teams and what’s happened over the course of the last several years. What do you think… What do you see is the next big leap or next big hurdle or next big monster that these guys are gonna have to overcome? What is it with some of the things that have come in to larger data engineering teams, like folks actually starting to figure out how to put these data engineering teams together, things are moving along, what is the next thing that folks should be on the look out for?

JA: I think the next thing is, I guess it would be the current next thing, real time. So not everybody’s doing real-time, and you have some companies on one side saying everything should be real-time, and I disagree with that, and then you have some companies with some batch… So what I would be doing if I were those companies is I’d be looking at what sorts of things do need to be real-time, and here’s how you do… What should be real-time, you talk to the business, is the business crying that, “You aren’t getting me this result fast enough?” That, “You aren’t getting me this often enough?” That’s usually the indication that you need something in real-time, and so the data teams, they need to be looking at this, and as we talk about real-time, This isn’t just, “Hey, I’m going to operationalize Flink,” usually, there’s a bunch of other mess of this that they may not have hit before, and that is. A much deeper need for a real-time database, a much deeper need for a no-SQL database, a need for all these other things that they didn’t have with batch, let’s say.

JA: And so the teams, the data teams, especially the management, needs to really take a look at that and see what do we need to get from point A to point B? And if they just haul off and start doing that, more than likely they’ll fail because they won’t have thought through enough of the problems, enough of the technologies needed, and maybe that’s an interesting one, that’s maybe a question that you haven’t asked yet about data architects. And this is one that I get from companies who are somewhat far along or about to start sometimes, and they say, “What should my data architect be?” And I have an entire section in the book talking about this, where I don’t think anybody has ever talked about it, and what you need is a strong data architect, in my opinion.

JA: And by data architect, I don’t mean you have a person who writes out your schema. That isn’t what a data architect in this case means, because there are data architects out there that do that, we’re not trying to do junior modeling, we actually need an architect to help us say, “Okay, the data product needs to do this, this, and this.” And that is a significantly deeper technical discussion than simply data modeling, and so what’s happening there is teams, data engineering teams may either have a specific titular architect or they may do the architecture as a group, and both of those are possibilities, both of those are… I’ve seen work. However, you have to have the right data architect there.

JA: What I have seen be a consistent issue though, is that they will take a person who is a data warehouse architect and then say, “I dub thee the data architect for this big data system,” and now they’re completely out of their element, they’re completely out of their abilities and what they do is they generally do architecture by white paper because they can’t do it themselves. So they go out and they look, okay, what’s the most popular framework for this? And they’ll just choose that one instead of understanding the use case, an understanding, hey, actually that technology is not right for this, because of this, this and this. You just don’t understand that.

ML: I think you put your finger on a really important part of what’s changing in this landscape is the role of these data architects, it’s really… Their job description seems to be changing, to be… What I’m hearing from you is, it’s almost more of a data-centric enterprise architecture as opposed to a traditional data architect whose job, it is to maintain to canonical data model or whatever. Is there something that executives can do, or managers can do to either steer or find these people, if this role is just beginning to emerge?

JA: They can… So what I encourage managers to do, either if they’re getting started or you have a… Have a team already, and that’s to do a gap analysis. And that gap analysis is looking at where you’re strong and where you’re weak. And I actually walk through that in the Data Teams book of how to do that, so you look at the skills, you look for those skills, what are missing, and then once you figure out what skills are missing, then you can start to really make the case of, “I need an architect who knows this, who actually knows these technologies,” or what may happen is you may see that, hey, this data warehouse team is completely missing programming and as a direct result, we get this. So it’s the manager often times having to make a case or make the point, or if they’re an… If they’re just starting, that’s one thing, but usually the more difficult one is the teams that are already in place, and it’s trying to fix those, and it’s trying to get them either the budget or the help, frankly, to fix those teams. That’s usually what’s really important.

JA: So that help is convincing HR, convincing their manager, convincing them that, hey, the people that are on the team aren’t the right people, and that we may have to find new homes for them, quite honestly, as well as bringing in others. So there are quite a few different things that are in place. For those listening, there’s an entire chapter in the book about how do you start a team, and then there’s that other chapter I talked about about debugging, so it’s really… What you want to do is honesty, it takes some real honesty to look at this and call a spade a spade.

LD: Well, Jesse, thank you so much for chatting today. As always, it is a pleasure and fun and very informative to chat with you. So we appreciate you coming on and talking for a little bit and talking through some of the stuff, and again datateams.io.

JA: Yeah, this has been great. Thank you for having me. And best of luck, everybody.

LD: Thanks again to Jesse, it is always a pleasure to chat with him about what’s new in the world of data engineering and data science, and as you heard, data teams in general. If you’d like to check out his new book, you can find it at datateams.io. Also, as always, we wanna hear from you if you have episode suggestions or feedback, feel free to reach out to us on Twitter, LinkedIn, or a dataeng@ascend.io. And if you wanna hear more about how Ascend helps these data teams do more with their data with less code and less maintenance, you can visit us at Ascend.io. Welcome to the new era of data engineering.

Follow us on

Pin It on Pinterest

Share This

Share Post

Share this post with your friends!