Orchestration System Capabilities Comparison

Ascend.io vs Apache Airflow

Advancing to a Declarative Orchestration Approach

The current state of the art in orchestration are workflow automation systems like Airflow. Imperative by design, you are responsible for designing the tasks, connecting them together via dependency relations, and passing them to a compute and infrastructure team/layer that manages the execution. This model puts the manual burden on you, the developer, to manage the innumerable details within (and between) each task and ensure data is correct, often resulting in brittle pipelines that are expensive to run and maintain.

There is an alternative: dataflow automation systems that are Declarative in design. These systems leverage a sophisticated Control Plane that is non-existent in Imperative systems to translate high-level data specs into the myriad of tasks required to achieve the specification, and then schedule, size, execute, and manage those tasks on behalf of the developer. You’re only responsible for creating and curating the high-level spec, not HOW the dataflow automation system manages the legion of tasks required to implement your spec.

0 %
Less Code with a Declarative Approach

Declarative

Less code (95% less) Faster dev cycles Adaptive to changes Less maintenence Requires annotations to override automated behaviors Requires domain-specific control system

Imperative

Assumptions (stale ones) in code Manual optimizations State integrity checks Failure management Flexible High levels of control

Capability Comparison by Category

INGEST CAPABILITIES

ASCEND
AIRFLOW
Any Data, Anywhere, Any Format
Connect to any lake, queue, warehouse, database or API.
Native to the Platform
Custom Code
Change Detection
Detect and ingest new, updated, and deleted data automatically. Track where your data is located, how often it changes, what has already been ingested.
Fully Automated
Not Available
Data Profiling
Auto-profile every piece of data being ingested.
Fully Automated
Not Available
Automated Data Reformatting
Aggregate small files into single partitions for processing efficiency, and automatically convert any incoming format to Snappy compressed Parquet files.
Fully Automated
Not Available

TRANSFORM CAPABILITIES

ASCEND
AIRFLOW
Declarative Data Pipelines
Enable developers to focus code solely on WHAT they want done to the data. Zero code needed to orchestrate the underlying work on HOW to achieve the desired state.
Native to the Platform
Not Available
Interactive Pipeline Builder
Navigate live data pipelines, from source to sink, and everything in between. Trace data lineage, preview data, and prototype changes in minutes instead of days.
Fully Automated
Basic DAG Available
Queryable Pipeline
Query every component of the pipeline as a table to explore, validate, and manipulate data.
Fully Automated
Not Available
Git & CICD Integration
Use any CI/CD solution such as Jenkins or CircleCI.
Supported
Supported

ORCHESTRATE CAPABILITIES

ASCEND
AIRFLOW
Intelligent Persistence
Persist data at every stage of the pipeline to minimize compute cost, pinpoint defects, and massively reduce debug/restart time.
Fully Automated
Not Available
Data & Job Deduplication
Safely deduplicate work across all pipelines, ensuring your pipelines run fast, efficiently, and cost effectively, while making branching and merging as easy as it is with code.
Fully Automated
Not Available
Dynamic Partitioning
Auto-partition data to optimize propagation of incremental changes in data.
Fully Automated
Not Available
Automated Backfill
Efficient management of back-fill and late arriving data.
Supported
Not Available
Automated Spark Management
Automated Spark Management Optimize Spark parameters for every job, based on data and code profiles, and manage all aspects of jobs being sent for processing on the Spark engine.
Supported
Not Available

DELIVER CAPABILITIES

ASCEND
AIRFLOW
Notebook Connectors
Notebook Connectors Connect Jupyter, Zeppelin, and more directly to data pipelines for access to data as it moves through your data pipelines.
Supported
Supported
BI & Data Visualization
Feed data directly to your BI and Data Visualization tools.
Available
Not Available
File-Based Access
Get direct access to data at every stage of the pipeline (.snappy.parquet) files for efficient processing by other big data systems.
Fully Automated
Not Available
Record APIs & SDKs
Read records from any stage of any data pipeline via JDBC or high-speed records API.
Available
Not Available

OBSERVE CAPABILITIES

ASCEND
AIRFLOW
Automated Cataloging
Provide an organized and searchable access to all code and data under the platform’s management, with automated registration of new data sets and code.
Fully Automated
Not Available
Data Lineage Tracking
Data Lineage Tracking Instantly visualize the lineage of any column of data from sink to source, including all operations performed on it.
Fully Automated
Limited
Resource & Cost Reporting
For every piece of data in the system, report the resources required, historically and at present, to produce and maintain it.
Fully Automated
Not Available
Activity Monitoring & Reporting
Track of all user, data, and system events, with integration into external platforms such as Splunk and PagerDuty.
Fully Automated
Limited
Secure Data Feeds
Secure Data Feeds Reuse end result data sets in other pipelines through a subscribe workflow. Provide external access to end result data sets via API.
Fully Automated
Not Available
Data Garbage Crawl
Ability to crawl data storage systems, automatically deleting data that has been abandoned and is no longer associated with active data pipelines.
Fully Automated
Not Available

Ready for intelligent, declarative data orchestration?