The Autonomous Pipeline Engine
Powered by Ascend’s DataAware™ intelligence, the Autonomous Pipeline Engine converts your data goals into self-optimizing pipelines.
Under the Hood
The Autonomous Pipeline Engine
Frameworks vs. Control Systems
The status quo for building data pipelines is painful and cumbersome. Sure, building the first couple isn’t too bad…it’s the many that follow. And how they impact each other. And how they become harder and harder to maintain as the dependencies grow and the codebase becomes increasingly unruly. For us, this was a terminal path for our happiness as engineers.
To solve this, there are two options: Frameworks and Control Systems. Most other companies went the framework route. They are great at making it easier to build at a static point-in-time. But they lack a feedback loop, which is critical for dynamic systems.
Dynamic systems, like data pipelines, are where control planes shine. Instead of the “when X happens, do Y” framework mentality, a
control plane takes the approach of “no matter what happens, make the system look like Z.” As engineers, we wanted the latter. So we built the Dataflow Control Plane.
In architecting this, we solved for three key areas:
User-defined “blueprints” of pipelines
Translating blueprints into jobs and infrastructure
Persisting bidirectional feedback to always make it happen
Architecting the Autonomous Pipeline Engine
Defining Blueprints
In Ascend, you build out declarative DAGs to define these blueprints. Just tell us the inputs, the transforms you want to happen at each stage, and the outputs, and Ascend creates the blueprint from there. SQL was the first language supported, since that let us infer a lot of patterns from the code provided – making it a great training ground for the Control Plane. We now also support PySpark, with more options on the way.
Translating the Blueprints
With the blueprint in hand, the Control Plane understands the expected end-state of the pipelines and automates the Elastic Data Fabric to deliver on it. The current fabric utilizes Spark clusters deployed on Kubernetes, combined with the selected cloud provider’s persistent object store. By managing this underlying infrastructure as a service, the Control Plane has complete control to auto-scale and auto-orchestrate with a high degree of precision based on the data, code, concurrency, and SLAs.
Persisting Bidirectional Feedback
In Ascend, you build out declarative DAGs to define these blueprints. Just tell us the inputs, the transforms you want to happen at each stage, and the outputs, and Ascend creates the blueprint from there. SQL was the first language supported, since that let us infer a lot of patterns from the code provided – making it a great training ground for the Control Plane. We now also support PySpark, with more options on the way.