Data Lake ETL Tutorial: Using Ascend No- and Low-Code Connectors to Load Data

Back

Data Lake ETL Tutorial: Using Ascend No- and Low-Code Connectors to Load Data

Now that we’ve extracted some data from S3, cleaned it up using a SQL transform, we can start on the “L” of ETL and write our data back out to our data lake. Follow this guide to learn how.

Ascend.io

data-eng@ascend.io

Now that we’ve extracted some data from S3, cleaned it up using a SQL transform, we can start on the “L” of ETL and write our data back out to our data lake. Follow the guide below to learn how.

1. Under the build option we can see the variety of write connectors that are available, with more coming soon. Here we will choose an S3 write connector.

2. We simply click on it, name it, and select which node we want to transform from. We are going to write it out to a bucket, and we chose the prefix for where we want the data to live. We will need some credentials, and we can test the connection. Lastly, we just want to choose one of the standard formats we want to write the data out in; for this tutorial, we will choose parquet. Save it, and we will have the data writing out into S3.

3. Now that all the data is up to date, let’s say we needed to make a schema change. For example, if I wanted to edit this SQL, and didn’t want this extra column anymore:

4. I can go ahead and remove that, update the transform, and Ascend will immediately start rerunning from the correct node and knows that the downstream needs to be computed. Ascend won’t recompute all the way from the source, because it knows it doesn’t need to. Once Ascend has finished the transform, it will go ahead and rechange out all the data in the data lake using the new schema.

Data Lake ETL Tutorial: Using Ascend No- and Low-Code Connectors to Load Data

Try it out. Your future self will thank you :)