June 22, 2021

Practices for Data Warehousing with Ascend—Early Query, Fast Iteration, Collaboration

Jon Saltzman

Data Engineering 101

Introduction

Welcome to the first in a series of posts regarding patterns and practices we’ve learned while working with customers using Ascend within the core of their data platform/stack to help them build and manage data warehouses.

The concept of data warehousing has a history that goes back to the 1980s, and it is no mistake that the same decade that ended with the invention of the data warehouse is also the same that brought us Pac-Man, the first version of Microsoft Windows, and in which “The Computer” was named Time Magazine’s “Man of the Year”. This was a time of massive change and innovation and it is easy to forget just how “young” this industry is!

All data warehouses (DW) generally share a common purpose – to integrate, aggregate, store and model current and historical data from multiple data sources, and make it available to users/consumers, in an easy to understand manner, for decision support (analytics, BI, reporting, etc.). They often differ in their specific implementation details and techniques.

One of the largest differences in data warehouses usually comes down to the fundamental modeling techniques, such as snowflake (often thought of as the “OG” method and associated with Bill Inmon), dimensional (Ralph Kimball) or even data vault (Dan Lindstedt). Ascend works equally well as a part of any data warehouse ecosystem, regardless of your preferences for those specific implementation details (including handling both ETL and/or ELT-based approaches).

Over time, traditional data warehousing techniques for data processing, modeling, etc. became seen as “monolithic” and “slow”. We observed many challenges within organizations, as their data warehousing and BI teams simply could not keep up with the pace of organizational and business demands for producing outcomes with data (analytics, dashboards, reports, KPIs, and insights, to name a few).

One simple cause for this was the implementation details – those specific tools and processes teams used to build and populate data warehouses before we even called it “data engineering”. The methods were often rigid, methodical and rightly took time, often emphasizing correctness and quality. These teams were often the only group of specialist individuals enabled to do this kind of work. This created a natural bottleneck, at a time when organizations became even more “hungry” for data-driven analytics, outcomes and insights. This approach had to scale.

Practice #1: Early query and data exploration, fast iteration and collaboration

Our first practice is to query and explore your data early and often, even within the pipelines you build during your transformation and modeling process, even before the final production data warehouse model is addressed. Ascend allows you to query individual sources and transformations using industry-standard SQL syntax. No more opaque data pipelines!

This changes the paradigm significantly. What is going on in your data pipelines is no longer a hidden part of the process, visible to only a few specialists, but to anyone in the organization with the right permissions and tools. Your pipelines become part of the queryable ecosystem of data. This is a superpower when building and evolving both pipelines and data warehouses, especially in a more agile, demanding and ever-evolving environment!

Some of the early data warehousing approaches were based on having to wait until the data was fully conformed, fully modeled and fully loaded into a data warehouse before end-users and consumers in organizations could start to explore, analyze and use the data. This was inclusive of the data engineering work necessary to enable the data warehouse. This was just the beginning, but it was just not fast and flexible enough, and we could always do better.

We have seen this ability to do early query and data exploration enable our customers to discover “how their data works” and tailor their data warehouse model to the organization much faster than ever before. Ascend compresses the time between building the pipeline to building the model – you can start mocking up your model and exploring/querying how everything can work as part of building the data pipeline process itself.

Our customers specifically tell us that this superpower makes a huge difference doing fast iteration on data pipeline development, especially for data warehouses, because getting the transformations and model right the first time can be difficult and time-consuming. Specifically, being able to shape the data interactively with their users through query, or for users being able to do this on their own as an input to the data engineering team, has enabled data engineers to move much more quickly and produce more highly desirable results.

Ascend gives individuals and teams within organizations a chance to “know their data”, and for that not to just be limited to specialists – all part of the democratization of IT functions across the organization (another topic altogether). Customers can just “get right in there” and have citizen data engineers working alongside the core data engineering and data warehouse teams, collaborating on data across the organization. It is this kind of organizational collaboration that produces all kinds of improved outcomes for the organization.

When people understand their own data, it makes it that much easier for them to collaborate on that data. We have also seen this approach generate early insights that start answering questions more quickly – producing outcomes for the business ahead of getting data into the production data warehouse!

What a difference this small change made to the velocity and ability for these data warehousing teams and organizations to scale! This best practice is all about taking full advantage of techniques supported by Ascend that enable early query exploration, fast iteration and collaboration.

When you enable people to work together and get to value sooner, while you are also incrementally evolving your technically sound, carefully modeled & tailored, and fully conformed data warehouse, you succeed!

Conclusion

In this post, we covered the first of many key patterns and practices that we have seen for teams working with data warehouses and Ascend. We encourage our readers to submit additional ideas for patterns and practices that they see while using Ascend with data warehouses and/or questions on how to implement a particular one within Ascend.

There is an important lesson here, on what seems like radical change, but may actually be constant incremental innovation, when seen over a longer period of time. You can “stand on the shoulders of giants” by combining the data warehousing knowledge that has been created by those who came before, and extend that with the latest and greatest improvements that makes those approaches successful more quickly.

You see the theme of agile, incremental, dynamic and adaptable strategies across our industry for example, within the newer concept of DataOps and other similar approaches. One could say that we have reached a new era of “agile data engineering”. Don’t get caught up too much in “what does it mean to be agile” in any formal sense, but more from a pragmatic perspective – what does it mean to be able to build data pipelines in an interactive, collaborative, incremental, etc. way in response to organizational needs, and to share that responsibility with an even larger group of individuals within the organization?

By combining tried and true data warehouse techniques and a cutting-edge data engineering platform such as Ascend, you truly get the best of both worlds. Come join us and build your data warehouse, or the next component of your data platform, with our unified data engineering platform, Ascend!