by Marin Gow, Senior Data Analyst at HNI Corporation
At HNI Corporation, our nine-person Decision Science team of data scientists, analysts and data engineers uses Ascend daily to analyze and preprocess data. A key component of our workflow is outputting our processed Ascend data to visualization tools to identify trends and expose our findings to our business users. One visualization tool we frequently work with is Power BI.
Generally, to expose our data from Ascend to Power BI, we use Ascend’s built-in web connector functionality. However, on a recent project, we hit the size limit of the web connector and needed a way to push a larger dataset from Ascend to Power BI. This how-to will walk through how to set up an ODBC connection between Ascend and Power BI in cases when your dataset is too large to expose through the web connector.
When should I use an ODBC connection?
The Ascend web connector is easier to set up than the ODBC connector, so if your dataset is small (under a million rows or so), I would recommend using the web connector. You can always test the web connector first – if Power BI begins loading the data and then errors out after one or two million rows, it is likely you have hit the web connector’s size limit. In that case, an ODBC connection is the best option for a direct connection between Ascend and Power BI.
Another alternative is to write your Ascend data to a cloud storage location like S3 or Azure Blob Storage and then read it into Power BI. We also use this method, but the nice thing about ODBC is that it’s a direct connection, rather than the two-step process of a write followed by a read.
Caption: Ascend’s web connector functionality, with setup instructions, can be found under the Data Integrations tab of any datafeed component.
Before you begin:
If you don’t already have an Ascend access token to use for your Power BI connection, set one up before you begin. You’ll need these credentials in a later step.
You can do this by going to one of your Ascend dataflows and using the gear icon to open Data Service Settings. Go to the Service Accounts tab. You can either use an existing Service Account or create a new one with Read Only permissions. Click the + API TOKEN button and record the displayed credentials for use in the next step.
How to set up the ODBC connector:
- First, check whether you are running 32- or 64-bit Power BI by opening Power BI Desktop and going to File -> About. Your version will be listed there. These instructions assume the more common 64-bit, but they should still work for 32-bit. The important thing is to use the matching version for all the subsequent steps – so if you are running 32-bit, also use the 32-bit ODBC driver and create your ODBC Data Source under 32-bit data sources.
- Next, install the ODBC driver for Spark, which can be found here. Open the MSI and follow the installation prompts provided.
- In the Start menu, search for ODBC Data Sources (64-bit) from Start menu and open it. Go to the System DSN tab. (Open the 32-bit ODBC Data Sources here if you are running 32-bit Power BI.)
- Click Add to add a new data source. Select Microsoft Spark ODBC Driver and click Finish.
- Configure the data source like the below example, using your host and port and the username and password you generated above.
- Click Test to test the connection and make sure it succeeds. If it doesn’t, double-check your inputs and your credentials.
- Now go to Power BI Desktop. Select Get Data from the toolbar, select the ODBC connector, and click Connect.
- Select the Ascend data source you just created and click OK.
- From this screen, open the database dropdown, select a table, and click Load to load the data.
Your data connection is set up!
How we used the ODBC connection:
HNI Corporation is a furnishings and building products company, which means our Decision Science team often works with detailed catalog and order datasets. Recently, our team was tasked with an association analysis to determine which types of products are frequently sold together. To get to the answer, we went through several steps using different tools.
- A PySpark script to generate frequent itemsets using Spark ML’s FPGrowth class
- Ascend to store the results, filter for desired rows, and preprocess the data for visualization
- Power BI to visualize relationships and expose to end users
We wrote the raw output of our PySpark script to Azure Blob storage, then read the files into Ascend using the Azure Blob storage read connector. Coming into Ascend, our data looked like the table below, with each item from the itemset represented in a separate column. Some of our itemsets were as large as 19 items, but for our analysis, we focused on two- and three-item sets.
We used Ascend to filter down to itemsets containing two or three items as well as join in product information that would be useful for visualization.
We used an ODBC connection to load the data directly into Power BI. Below are a couple examples of visuals we were able to create and publish from the cleaned dataset.
This first chart allows us to select a series from our furniture offerings to see what products it is frequently paired with. The example below is a popular series of HON office chairs called Ignition 2. The most common pairing for this chair is with our 10500 Series desks, followed by desk accessories and a series called Preside Tables. This makes sense, as many buyers are outfitting a complete workstation and need a desk and a chair.
We can dig a bit deeper into the popular Ignition 2 chair and 10500 Series desk pairing to understand what additional items a buyer may choose after selecting these two items. In this case, buyers are most likely to add desk accessories and chairs from the original Ignition Seating line. The dark blue line, which shows lift, tells you how much more likely the buyer is to also purchase the item given they have already selected an Ignition 2 chair and a 10500 Series desk, compared to the baseline likelihood in all orders.
We hope this example demonstrates how our team uses a combination of Ascend, Power BI, and other tools to quickly process and visualize data to get to insights!