Ep 9 – Streamlining Data Infrastructure with a new Single Cluster Footprint

About this Episode

by Joe Stevens, Lead Infrastructure Engineer

As you may be aware, the team here at Ascend.io recently made a few significant upgrades to the infrastructure footprint that supports the Ascend.io environments our customers use. These upgrades bring with them a number of benefits, including cost reductions, better resource utilization, and faster launch times for particular workloads.

I chat more about it in the below podcast, but here are some quick insights on what’s changing. 

In short, we’re consolidating our primary infrastructure backbone from two kubernetes clusters to one. This enables us to:

  • Reduce the minimum # of VMs from four (one On Demand, three Spot) to just two (one On Demand, one Spot)
  • Reduce the NAT gateways from six to three (AWS only)
  • Reduce Network Load Balancers from two to one (Azure Only)

Because of this, our customers total baseline infrastructure cost will drop, which, as we know, is always appreciated! 

Why are we able to make these changes now?

Kubernetes as a platform has come a long way in the past few years and both its security capabilities, as well as its stability at scale, have improved significantly. As a result, we’re able to continue to ensure proper security boundaries while gaining better resource sharing, and at scale.

Additionally, we’ve introduced a number of Spark improvements most notably of which are dynamic reuse of Spark clusters, which allows us to reuse Spark clusters more efficiently. This not only has provided faster job launch time, but also significantly reduces load on the kubernetes cluster managers as the reuse of clusters means we are not actively creating and deleting as many pods on the cluster.

What this enables next

With this new infrastructure foundation, we now have the ability to open up a number of new capabilities for you, including:

  • Auto-scaled JDBC, QDF, and more services
  • Fine-grained controls over scaling limits for each of the above services
  • Multi-dataplane deployments (ie run transforms on spark and snowflake)

If you have any questions, feel free to reach out at [email protected] or in our community slack