The Autonomous Pipeline Engine

Powered by Ascend’s DataAware™ intelligence, the Autonomous Pipeline Engine converts your data goals into self-optimizing pipelines.

Start Building

Under the Hood

The Autonomous Pipeline Engine

Frameworks vs. Control Systems

The status quo for building data pipelines is painful and cumbersome. Sure, building the first couple isn’t too bad…it’s the many that follow. And how they impact each other. And how they become harder and harder to maintain as the dependencies grow and the codebase becomes increasingly unruly. For us, this was a terminal path for our happiness as engineers.

To solve this, there are two options: Frameworks and Control Systems. Most other companies went the framework route. They are great at making it easier to build at a static point-in-time. But they lack a feedback loop, which is critical for dynamic systems.

Dynamic systems, like data pipelines, are where control planes shine. Instead of the “when X happens, do Y” framework mentality, a

control plane takes the approach of “no matter what happens, make the system look like Z.” As engineers, we wanted the latter. So we built the Dataflow Control Plane.

In architecting this, we solved for three key areas:

i

User-defined “blueprints” of pipelines

Translating blueprints into jobs and infrastructure

+

Persisting bidirectional feedback to always make it happen

Architecting the Autonomous Pipeline Engine

i

Defining Blueprints

In Ascend, you build out declarative DAGs to define these blueprints. Just tell us the inputs, the transforms you want to happen at each stage, and the outputs, and Ascend creates the blueprint from there. SQL was the first language supported, since that let us infer a lot of patterns from the code provided – making it a great training ground for the Control Plane. We now also support PySpark, with more options on the way.

Translating the Blueprints

With the blueprint in hand, the Control Plane understands the expected end-state of the pipelines and automates the Elastic Data Fabric to deliver on it. The current fabric utilizes Spark clusters deployed on Kubernetes, combined with the selected cloud provider’s persistent object store. By managing this underlying infrastructure as a service, the Control Plane has complete control to auto-scale and auto-orchestrate with a high degree of precision based on the data, code, concurrency, and SLAs.

+

Persisting Bidirectional Feedback

In Ascend, you build out declarative DAGs to define these blueprints. Just tell us the inputs, the transforms you want to happen at each stage, and the outputs, and Ascend creates the blueprint from there. SQL was the first language supported, since that let us infer a lot of patterns from the code provided – making it a great training ground for the Control Plane. We now also support PySpark, with more options on the way.

Pin It on Pinterest