How To: Connecting Ascend.io to Power BI via ODBC

Back

How To: Connecting Ascend.io to Power BI via ODBC

Marin Gow

by Marin Gow, Senior Data Analyst at HNI Corporation

*This guide is written for a legacy version of the Ascend platform. For current documentation please visit: docs.ascend.io

At HNI Corporation, our nine-person Decision Science team of data scientists, analysts and data engineers uses Ascend daily to analyze and preprocess data. A key component of our workflow is outputting our processed Ascend data to visualization tools to identify trends and expose our findings to our business users. One visualization tool we frequently work with is Power BI.

Generally, to expose our data from Ascend to Power BI, we use Ascend's built-in web connector functionality. However, on a recent project, we hit the size limit of the web connector and needed a way to push a larger dataset from Ascend to Power BI. This how-to will walk through how to set up an ODBC connection between Ascend and Power BI in cases when your dataset is too large to expose through the web connector.

When should I use an ODBC connection?

The Ascend web connector is easier to set up than the ODBC connector, so if your dataset is small (under a million rows or so), I would recommend using the web connector. You can always test the web connector first — if Power BI begins loading the data and then errors out after one or two million rows, it is likely you have hit the web connector's size limit. In that case, an ODBC connection is the best option for a direct connection between Ascend and Power BI.

Another alternative is to write your Ascend data to a cloud storage location like S3 or Azure Blob Storage and then read it into Power BI. We also use this method, but the nice thing about ODBC is that it's a direct connection, rather than the two-step process of a write followed by a read.

Caption: Ascend's web connector functionality, with setup instructions, can be found under the Data Integrations tab of any datafeed component.

Before you begin:

If you don't already have an Ascend access token to use for your Power BI connection, set one up before you begin. You'll need these credentials in a later step.

You can do this by going to one of your Ascend dataflows and using the gear icon to open Data Service Settings. Go to the Service Accounts tab. You can either use an existing Service Account or create a new one with Read Only permissions. Click the + API TOKEN button and record the displayed credentials for use in the next step.

How to set up the ODBC connector:

First, check whether you are running 32- or 64-bit Power BI by opening Power BI Desktop and going to File -> About. Your version will be listed there. These instructions assume the more common 64-bit, but they should still work for 32-bit. The important thing is to use the matching version for all the subsequent steps — so if you are running 32-bit, also use the 32-bit ODBC driver and create your ODBC Data Source under 32-bit data sources.
Next, install the ODBC driver for Spark, which can be found here. Open the MSI and follow the installation prompts provided.
In the Start menu, search for ODBC Data Sources (64-bit) from Start menu and open it. Go to the System DSN tab. (Open the 32-bit ODBC Data Sources here if you are running 32-bit Power BI.)
Click Add to add a new data source. Select Microsoft Spark ODBC Driver and click Finish.
Configure the data source like the below example, using your host and port and the username and password you generated above.
Click Test to test the connection and make sure it succeeds. If it doesn't, double-check your inputs and your credentials.
Now go to Power BI Desktop. Select Get Data from the toolbar, select the ODBC connector, and click Connect.
Select the Ascend data source you just created and click OK.
From this screen, open the database dropdown, select a table, and click Load to load the data.

Your data connection is set up!

How we used the ODBC connection:

HNI Corporation is a furnishings and building products company, which means our Decision Science team often works with detailed catalog and order datasets. Recently, our team was tasked with an association analysis to determine which types of products are frequently sold together. To get to the answer, we went through several steps using different tools.

A PySpark script to generate frequent itemsets using Spark ML's FPGrowth class
Ascend to store the results, filter for desired rows, and preprocess the data for visualization
Power BI to visualize relationships and expose to end users

We wrote the raw output of our PySpark script to Azure Blob storage, then read the files into Ascend using the Azure Blob storage read connector. Coming into Ascend, our data looked like the table below, with each item from the itemset represented in a separate column. Some of our itemsets were as large as 19 items, but for our analysis, we focused on two- and three-item sets.

We used Ascend to filter down to itemsets containing two or three items as well as join in product information that would be useful for visualization.

We used an ODBC connection to load the data directly into Power BI. Below are a couple examples of visuals we were able to create and publish from the cleaned dataset.

This first chart allows us to select a series from our furniture offerings to see what products it is frequently paired with. The example below is a popular series of HON office chairs called Ignition 2. The most common pairing for this chair is with our 10500 Series desks, followed by desk accessories and a series called Preside Tables. This makes sense, as many buyers are outfitting a complete workstation and need a desk and a chair.

We can dig a bit deeper into the popular Ignition 2 chair and 10500 Series desk pairing to understand what additional items a buyer may choose after selecting these two items. In this case, buyers are most likely to add desk accessories and chairs from the original Ignition Seating line. The dark blue line, which shows lift, tells you how much more likely the buyer is to also purchase the item given they have already selected an Ignition 2 chair and a 10500 Series desk, compared to the baseline likelihood in all orders.

We hope this example demonstrates how our team uses a combination of Ascend, Power BI, and other tools to quickly process and visualize data to get to insights!

def fetch_commit_history(
  repos: Union[str, List[str], pathlib.Path],
  timeout_seconds: int = 120,
  since_date: Optional[str] = None,
  from_ref: Optional[str] = None,
  to_ref: Optional[str] = None,
) -> Dict[str, List[Dict[str, Any]]]:
  """
  Fetches commit history from one or multiple GitHub repositories using the GitHub CLI.
  Works with both public and private repositories, provided the authenticated user has access.
  """
  # Check GitHub CLI is installed
  subprocess.run(
    ["gh", "--version"],
    capture_output=True,
    check=True,
    timeout=timeout_seconds,
  )

  # Process the repos input to handle various formats
  if isinstance(repos, pathlib.Path) or (isinstance(repos, str) and os.path.exists(repos) and repos.endswith(".json")):
    with open(repos, "r") as f:
      repos = json.load(f)
  elif isinstance(repos, str):
    repos = [repo.strip() for repo in repos.split(",")]

  results = {}
  for repo in repos:
    # Get repository info and default branch
    default_branch_cmd = subprocess.run(
      ["gh", "api", f"/repos/{repo}"],
      capture_output=True,
      text=True,
      check=True,
      timeout=timeout_seconds,
    )
    repo_info = json.loads(default_branch_cmd.stdout)
    default_branch = repo_info.get("default_branch", "main")

    # Build API query with parameters
    api_path = f"/repos/{repo}/commits"
    query_params = ["per_page=100"]

    if since_date:
      query_params.append(f"since={since_date}T00:00:00Z")

    target_ref = to_ref or default_branch
    query_params.append(f"sha={target_ref}")

    api_url = f"{api_path}?{'&'.join(query_params)}"

    # Fetch commits using GitHub CLI
    result = subprocess.run(
      ["gh", "api", api_url],
      capture_output=True,
      text=True,
      check=True,
      timeout=timeout_seconds,
    )

    commits = json.loads(result.stdout)
    results[repo] = commits

  return results

Key implementation details:

GitHub CLI integration: Uses the `gh` command-line tool for authenticated API access to both public and private repositories
‍
Flexible input handling: Accepts single repos, comma-separated lists, or JSON files containing repository lists
‍
Robust error handling: Validates GitHub CLI installation and repository access before attempting to fetch commits
‍
Configurable date filtering: Supports both date-based and ref-based commit filtering

‍

AI-Powered Summarization

def summarize_text(content: str, api_key: Optional[str] = None) -> str:
  """
  Summarize provided text content (e.g., commit messages) using OpenAI API.
  """
  if not content.strip():
    return "No commit data found to summarize"

  # Get API key from parameter or environment
  api_key = api_key or os.getenv("OPENAI_API_KEY")
  if not api_key:
    raise RuntimeError("OpenAI API key not found. Set the OPENAI_API_KEY environment variable.")

  client = OpenAI(api_key=api_key)
  response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
      {"role": "system", "content": SYSTEM_PROMPT},
      {"role": "user", "content": content},
    ],
    temperature=0.1,
    max_tokens=1000,
  )
  return response.choices[0].message.content.strip()

def summarize_commits(content: str, add_date_header: bool = True) -> str:
  """
  Summarize commit content and optionally add a date header.
  """
  summary_body = summarize_text(content)

  if add_date_header:
    # Add header with week date
    now_iso = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ")
    monday = get_monday_of_week(now_iso)
    return f"## 🗓️ Week of {monday}\n\n{summary_body}"

  return summary_body

Our initial system prompt for consistent categorization:

You are a commit message organizer. Analyze the commit messages and organize them into a clear summary.

Group similar commits and format as bullet points under these categories:
- 🚀 Features
- ⚠️ Breaking changes
- 🌟 Improvements
- 🛠️ Bug fixes
- 📝 Additional changes
...
Within the Improvements section, do not simply say "Improved X" or "Fixed Y" or "Added Z" or "Removed W".
Instead, provide a more detailed and user-relevant description of the improvement or fix.

Convert technical commit messages to user-friendly descriptions and remove PR numbers and other technical IDs.
Focus on changes that would be relevant to users and skip internal technical changes.

Format specifications:
- Format entries as bullet points: "- [Feature description]"
- Use clear, user-friendly language while preserving technical terms
- For each item, convert technical commit messages to user-friendly descriptions:
   - "add line" → "New line functionality has been added"
   - "fix css overflow" → "CSS overflow issue has been fixed"
- Capitalize Ascend-specific terms in bullet points such as "Components"

Strictly exclude the following from your output:
- Any mentions of branches (main, master, develop, feature, etc.)
- Any mentions of AI rules such as "Added the ability to specify keywords for rules"
- Any references to branch integration or merges
- Any language about "added to branch" or "integrated into branch"
- Dependency upgrades and version bumps
…

Prompt engineering:

Structured categorization: Our prompt enforces specific emoji-categorized sections for consistent output formatting
‍
User-focused translation: Explicitly instructs the AI to convert technical commits into user-friendly language
‍
Content filtering: Automatically excludes dependency updates, test changes, and internal technical modifications
‍
Low temperature setting: Uses 0.1 temperature for consistent, factual output rather than creative interpretation

Content Integration and File Management

def get_monday_of_week(date_str: str) -> str:
  """
  Get the Monday of the week containing the given date, except for Sunday which returns the next Monday.
  """
  date = datetime.strptime(date_str, "%Y-%m-%dT%H:%M:%SZ")

  # For Sunday (weekday 6), get the following Monday
  if date.weekday() == 6:  # Sunday
    days_ahead = 1
  else:  # For all other days, get the Monday of the current week
    days_behind = date.weekday()
    days_ahead = -days_behind

  target_monday = date + timedelta(days=days_ahead)
  return target_monday.strftime("%Y-%m-%d")

File handling considerations:

Consistent date formatting: Automatically calculates the Monday of the current week for consistent release note headers
‍
Encoding safety: Properly handles Unicode characters in commit messages from international contributors
‍
Atomic file operations: Uses temporary files during processing to prevent corruption if the process is interrupted

GitHub Actions: Orchestrating the Automation

Our workflow ties everything together with robust automation that handles the complexities of CI/CD environments.

Workflow Triggers and Inputs

name: Weekly Release Notes Update
on:
  workflow_dispatch:
    inputs:
      year:
        description: 'Year (YYYY) of date to start collecting releases from'
        default: '2025'
      month:
        description: 'Month (MM) of date to start collecting releases from'
        default: '01'
      day:
        description: 'Day (DD) of date to start collecting releases from'
        default: '01'
      repo_filters:
        description: 'JSON string defining filters for specific repos'
        required: false
      timeout_seconds:
        description: 'Timeout in seconds for API calls'
        default: '45'

Flexible triggering options:

Manual dispatch with granular date control: Separate year, month, day inputs for precise date filtering
‍
Repository-specific filtering: JSON configuration allows different filtering strategies per repository
‍
Configurable timeouts: Adjustable API timeout settings for different network conditions

Secure Authentication Flow

- uses: actions/create-github-app-token@v2
  id: app-token
  with:
    app-id: <YOUR-APP-ID>
    private-key: ${{ secrets.GHA_DOCS_PRIVATE_KEY }}
    owner: ascend-io
    repositories: ascend-docs,ascend-core,ascend-ui,ascend-backend

Security best practices:

GitHub App with specific repository access: Explicitly lists only the repositories that need access
‍
Scoped permissions: App configured with minimal necessary permissions for the specific repositories
‍
Secret management: Private key stored securely in GitHub Secrets

Repository Configuration Processing

- name: Prepare repository filter configuration
  run: |
    CONFIG_FILE=$(mktemp)
    echo "{}" > "$CONFIG_FILE"

    if [ -n "${{ github.event.inputs.repo_filters }}" ]; then
      echo '${{ github.event.inputs.repo_filters }}' > "$CONFIG_FILE"
    else
      DATE_STRING="${{ github.event.inputs.year }}-${{ github.event.inputs.month }}-${{ github.event.inputs.day }}"
      jq -r '.[]' bin/release_notes/input_repos.json | while read -r REPO; do
        FILTER="since:$DATE_STRING"
        jq --arg repo "$REPO" --arg filter "$FILTER" '. + {($repo): $filter}' "$CONFIG_FILE" > "${CONFIG_FILE}.tmp" && mv "${CONFIG_FILE}.tmp" "$CONFIG_FILE"
      done
    fi

    CONFIG_JSON=$(cat "$CONFIG_FILE")
    echo "config_json<<EOF" >> $GITHUB_OUTPUT
    echo "$CONFIG_JSON" >> $GITHUB_OUTPUT
    echo "EOF" >> $GITHUB_OUTPUT

Data Processing and File Management

- name: Generate release notes
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
    GITHUB_TOKEN: ${{ steps.app-token.outputs.token }}
  run: |
    CONFIG_JSON='${{ steps.repo_config.outputs.config_json }}'
    CONFIG_FILE=$(mktemp)
    echo "$CONFIG_JSON" > "$CONFIG_FILE"

    RAW_OUTPUT=$(python bin/release_notes/generate_release_notes.py \
      --repo-config-string "$(cat "$CONFIG_FILE")" \
      --timeout "${{ github.event.inputs.timeout_seconds }}")

    # Split summary and commits using delimiter
    SUMMARY=$(echo "$RAW_OUTPUT" | sed -n '1,/^### END SUMMARY ###$/p' | sed '$d')
    MONDAY_DATE=$(echo "$SUMMARY" | head -n1 | grep -oE "[0-9]{4}-[0-9]{2}-[0-9]{2}")

    echo "monday_date=$MONDAY_DATE" >> $GITHUB_OUTPUT
    echo 'summary<<EOF' >> $GITHUB_OUTPUT
    echo "$SUMMARY" >> $GITHUB_OUTPUT
    echo 'EOF' >> $GITHUB_OUTPUT

Key implementation lessons:

Temporary file strategy: We learned the hard way that GitHub Actions environments can lose data between steps. Writing to temporary files solved reliability issues where data would appear blank in subsequent steps.
‍
Complex JSON handling: Uses `jq` for safe JSON manipulation and temporary files to avoid shell quoting issues with complex JSON strings
‍
Output parsing: Logic to split AI-generated summaries from raw commit data using delimiter markers
‍
Robust error handling: `set -euo pipefail` ensures the script fails fast on any error, preventing silent failures

File Integration and Pull Request Creation

- name: Update whats-new.mdx with release notes
  run: |
    FILE="website/docs/whats-new.mdx"
    BRANCH_NAME="notes-${{ steps.generate_notes.outputs.monday_date }}"
    git branch $BRANCH_NAME main
    git switch $BRANCH_NAME

    TEMP_SUMMARY_FILE=$(mktemp)
    echo '${{ steps.generate_notes.outputs.summary }}' > "$TEMP_SUMMARY_FILE"
    cat "$TEMP_SUMMARY_FILE" "$FILE" > "${FILE}.new"
    mv "${FILE}.new" "$FILE"
    rm -f "$TEMP_SUMMARY_FILE"

File management features:

Atomic file operations: Uses temporary files and atomic moves to prevent file corruption
‍
Branch management: Creates date-based branches for organized PR tracking
‍
Content preservation: Carefully prepends new content while preserving existing documentation structure

Lessons Learned and Best Practices

Building this pipeline taught us valuable lessons about documentation automation that go beyond the technical implementation.

Technical Insights

File persistence matters in CI/CD environments. GitHub Actions environments can be unpredictable—always write important data to files rather than relying on environment variables or memory. We learned this the hard way when release notes would mysteriously appear blank in PRs.
‍

API reliability requires defensive programming. Build retry logic and fallbacks for external API calls (OpenAI, GitHub). Network issues and rate limits are inevitable, especially as your usage scales.
‍

Prompt engineering is crucial for consistent output. Spend time crafting prompts that consistently produce the format and tone you want. Small changes in wording can dramatically affect AI output quality and consistency.
‍

Human review is essential, even with AI generation. Having team members review PRs catches edge cases, ensures quality, and builds confidence in the automated system. The goal isn't to eliminate human oversight—it's to make it more efficient and focused.
‍

Historical tracking and product evolution insights. Automated generation creates a consistent record of product evolution that's valuable for retrospectives, planning, and onboarding new team members.
‍

Results and Impact

The automation has fundamentally transformed our release process and team dynamics:

Quantifiable Improvements

Dramatic time savings: Reduced release note creation from 2-3 hours of writing time to 15 minutes of review time. That's a 90% reduction in effort while improving quality and consistency.
‍

Perfect consistency: Every release now has properly formatted, comprehensive notes. No more missed releases or inconsistent formatting across different team members.

‍

Increased frequency: We can now generate release notes weekly, providing users with more timely updates about product improvements.

‍

Complete coverage: Captures changes across all repositories without manual coordination, eliminating the risk of missing important updates.

‍

Next Steps and Future Enhancements

We're continuously improving the pipeline based on team feedback and evolving needs:

Immediate Roadmap

Slack integration: Building a Slackbot to automatically share release notes with our community channels, extending the reach beyond just documentation updates.

‍

Repository tracing: Categorize the raw commits by repository and add links so it's easy to (literally) double-click into each PR for additional context.

Future Possibilities

Multi-language support: Generating release notes in different languages for global audiences as we expand internationally.
‍

Ready to automate your own release notes? Start with the requirements above and build incrementally. Begin with a single repository, get the basic workflow running, then expand to multiple repos and add advanced features. Your future self (and your team) will thank you for eliminating this manual drudgery and creating a more consistent, professional release process.

‍

The investment in automation pays dividends immediately—not just in time saved, but in the improved quality and consistency of your user communication. In a time where software moves fast, automated release notes ensure your documentation keeps pace.

How To: Connecting Ascend.io to Power BI via ODBC

by Marin Gow, Senior Data Analyst at HNI Corporation

When should I use an ODBC connection?

Before you begin:

How to set up the ODBC connector:

How we used the ODBC connection:

AI-Powered Summarization

Content Integration and File Management

GitHub Actions: Orchestrating the Automation

Workflow Triggers and Inputs

Secure Authentication Flow

Repository Configuration Processing

Data Processing and File Management

File Integration and Pull Request Creation

Lessons Learned and Best Practices

Technical Insights

Results and Impact

Quantifiable Improvements

Next Steps and Future Enhancements

Immediate Roadmap

Future Possibilities

Try it out. Your future self will thank you :)