Tiger Global, Shasta Ventures, and Accel Back Ascend.io's Advanced Data Automation Cloud to Dramatically Increase Data Team Productivity
MENLO PARK, Calif. — April 6, 2022 Ascend.io, the Data Automation Cloud, today announced $31 million in Series B funding led by Tiger Global with participation from Shasta Ventures and existing investor Accel. The additional capital will be used by Ascend.io to scale go-to-market efforts and expansion into new geographies, as well as extend Ascend's Data Automation Cloud to support full multi-cloud data mesh automation. "Backing Ascend.io is an exciting move for us," said Issac Roth, Managing Director at Shasta Ventures. "We are bullish on the data ecosystem as a whole and it is increasingly clear that Ascend.io's innovative approach to automation addresses the rapidly growing and presently unmet needs of the market. Their growth numbers and customer satisfaction are inspiring, and we are excited to partner with them on this journey."The steady rise of AI, machine learning, and analytics powering today's hyperconnected world has spiked demand for data workloads and has progressively strained data team bandwidth and productivity. To meet this demand, companies are increasingly looking to data automation and adopting the Ascend Data Automation Cloud to accelerate the speed and scale at which data initiatives are delivered. Ascend.io's automation platform has gained significant favor amongst data engineering and analytics teams with its ability to dramatically enhance the full spectrum of data engineering capabilities, bolstering productivity and impact of the entire data team.Data automation needs span a diverse and growing data ecosystem and companies require a solution that ensures they can keep pace with data's rapidly expanding landscape. The Ascend Data Automation Cloud unifies the core data engineering capabilities—ingestion, transformation, delivery, orchestration, and observability—into a seamless experience. Powered by Ascend's DataAware™ intelligence, the Ascend Data Automation Cloud eliminates more than 90% of custom code, empowering data teams to focus on what matters the most: data. The Ascend platform works continuously in the background monitoring data integrity and optimizing data workloads, resulting in drastically reduced time spent on management and maintenance.
Rapid Adoption of Ascend Data Automation Cloud Results in 4x User Growth, 7x Revenue Growth
Ascend.io's leadership in data engineering and automation is highlighted by key milestones from the last twelve months, including 4x growth in users and 7x growth in revenue. Additionally, with the first annual Data Automation Summit happening April 13-14, 2022, Ascend.io is leading the charge to advance the data community's conversation around the importance of data automation. "At Ascend.io, our focus on data engineering productivity has truly resonated with customers, and we're seeing faster growth across our user base than ever before," said Sean Knapp, CEO and Founder of Ascend.io. "This latest round of funding enables us to help even more data teams around the globe take advantage of this new wave of automation, ensuring they have a leg up as they drive innovation in their respective industries."
Company Previews Platform Expansion and Next-generation Data Mesh Automation
As the modern data stack expands further to encompass data lakes, data warehouses, and clouds, achieving unified access and continuity for robust data workloads requires increasingly advanced data automation to guarantee data integrity and confidence in the data systems. Ascend.io is excited to preview the development of data mesh automation for workloads spanning all clouds, lakes, and warehouses—offering not only a single pane of glass view, but also unified metadata, lineage, observability, and more. More information on this functionality and availability is expected later this year. Additionally, en route to full data mesh automation, Ascend.io will expand support for lakes and warehouses from currently supported platforms Spark, Snowflake, and Google BigQuery to also include Amazon Redshift, AWS Glue, Microsoft Azure Synapse, and more. "At Ascend.io, our focus on data engineering productivity has truly resonated with customers, and we're seeing faster growth across our user base than ever before," said Sean Knapp, CEO and Founder of Ascend.io. "This latest round of funding enables us to help even more data teams around the globe take advantage of this new wave of automation, ensuring they have a leg up as they drive innovation in their respective industries."
Tom Weeks Joins the Board of Directors
Ascend.io's Chief Business Officer Tom Weeks has joined the Board of Directors and has spearheaded the company's go-to-market efforts since he joined the executive team in mid-2018. Prior to his role at Ascend.io, Weeks held a variety of executive roles at companies including ThoughtSpot, Apigee (IPO), Skire (acquired by Oracle), Business Objects (acquired by SAP), and more. "I've had the opportunity to see multiple waves of data technology innovation throughout my career, and the groundswell around data automation is especially exciting for many reasons," said Weeks. "During my tenure at Ascend.io, I've spoken daily with companies looking for ways to improve their data engineering processes and productivity, and the conversation always comes down to the performance gains and ROI benefits Ascend.io's advanced automation capabilities deliver. I'm thrilled to be an ever bigger part of the Ascend.io team at the forefront of advancing automation for data teams." To learn more about Ascend.io and the Data Automation Cloud, please visit www.ascend.io/.
def fetch_commit_history(
repos: Union[str, List[str], pathlib.Path],
timeout_seconds: int = 120,
since_date: Optional[str] = None,
from_ref: Optional[str] = None,
to_ref: Optional[str] = None,
) -> Dict[str, List[Dict[str, Any]]]:
"""
Fetches commit history from one or multiple GitHub repositories using the GitHub CLI.
Works with both public and private repositories, provided the authenticated user has access.
"""
# Check GitHub CLI is installed
subprocess.run(
["gh", "--version"],
capture_output=True,
check=True,
timeout=timeout_seconds,
)
# Process the repos input to handle various formats
if isinstance(repos, pathlib.Path) or (isinstance(repos, str) and os.path.exists(repos) and repos.endswith(".json")):
with open(repos, "r") as f:
repos = json.load(f)
elif isinstance(repos, str):
repos = [repo.strip() for repo in repos.split(",")]
results = {}
for repo in repos:
# Get repository info and default branch
default_branch_cmd = subprocess.run(
["gh", "api", f"/repos/{repo}"],
capture_output=True,
text=True,
check=True,
timeout=timeout_seconds,
)
repo_info = json.loads(default_branch_cmd.stdout)
default_branch = repo_info.get("default_branch", "main")
# Build API query with parameters
api_path = f"/repos/{repo}/commits"
query_params = ["per_page=100"]
if since_date:
query_params.append(f"since={since_date}T00:00:00Z")
target_ref = to_ref or default_branch
query_params.append(f"sha={target_ref}")
api_url = f"{api_path}?{'&'.join(query_params)}"
# Fetch commits using GitHub CLI
result = subprocess.run(
["gh", "api", api_url],
capture_output=True,
text=True,
check=True,
timeout=timeout_seconds,
)
commits = json.loads(result.stdout)
results[repo] = commits
return results
Key implementation details:
- GitHub CLI integration: Uses the `
gh
` command-line tool for authenticated API access to both public and private repositories
- Flexible input handling: Accepts single repos, comma-separated lists, or JSON files containing repository lists
- Robust error handling: Validates GitHub CLI installation and repository access before attempting to fetch commits
- Configurable date filtering: Supports both date-based and ref-based commit filtering
AI-Powered Summarization
def summarize_text(content: str, api_key: Optional[str] = None) -> str:
"""
Summarize provided text content (e.g., commit messages) using OpenAI API.
"""
if not content.strip():
return "No commit data found to summarize"
# Get API key from parameter or environment
api_key = api_key or os.getenv("OPENAI_API_KEY")
if not api_key:
raise RuntimeError("OpenAI API key not found. Set the OPENAI_API_KEY environment variable.")
client = OpenAI(api_key=api_key)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": content},
],
temperature=0.1,
max_tokens=1000,
)
return response.choices[0].message.content.strip()
def summarize_commits(content: str, add_date_header: bool = True) -> str:
"""
Summarize commit content and optionally add a date header.
"""
summary_body = summarize_text(content)
if add_date_header:
# Add header with week date
now_iso = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ")
monday = get_monday_of_week(now_iso)
return f"## 🗓️ Week of {monday}\n\n{summary_body}"
return summary_body
Our initial system prompt for consistent categorization:
You are a commit message organizer. Analyze the commit messages and organize them into a clear summary.
Group similar commits and format as bullet points under these categories:
- 🚀 Features
- ⚠️ Breaking changes
- 🌟 Improvements
- 🛠️ Bug fixes
- 📝 Additional changes
...
Within the Improvements section, do not simply say "Improved X" or "Fixed Y" or "Added Z" or "Removed W".
Instead, provide a more detailed and user-relevant description of the improvement or fix.
Convert technical commit messages to user-friendly descriptions and remove PR numbers and other technical IDs.
Focus on changes that would be relevant to users and skip internal technical changes.
Format specifications:
- Format entries as bullet points: "- [Feature description]"
- Use clear, user-friendly language while preserving technical terms
- For each item, convert technical commit messages to user-friendly descriptions:
- "add line" → "New line functionality has been added"
- "fix css overflow" → "CSS overflow issue has been fixed"
- Capitalize Ascend-specific terms in bullet points such as "Components"
Strictly exclude the following from your output:
- Any mentions of branches (main, master, develop, feature, etc.)
- Any mentions of AI rules such as "Added the ability to specify keywords for rules"
- Any references to branch integration or merges
- Any language about "added to branch" or "integrated into branch"
- Dependency upgrades and version bumps
…
Prompt engineering:
- Structured categorization: Our prompt enforces specific emoji-categorized sections for consistent output formatting
- User-focused translation: Explicitly instructs the AI to convert technical commits into user-friendly language
- Content filtering: Automatically excludes dependency updates, test changes, and internal technical modifications
- Low temperature setting: Uses 0.1 temperature for consistent, factual output rather than creative interpretation
Content Integration and File Management
def get_monday_of_week(date_str: str) -> str:
"""
Get the Monday of the week containing the given date, except for Sunday which returns the next Monday.
"""
date = datetime.strptime(date_str, "%Y-%m-%dT%H:%M:%SZ")
# For Sunday (weekday 6), get the following Monday
if date.weekday() == 6: # Sunday
days_ahead = 1
else: # For all other days, get the Monday of the current week
days_behind = date.weekday()
days_ahead = -days_behind
target_monday = date + timedelta(days=days_ahead)
return target_monday.strftime("%Y-%m-%d")
File handling considerations:
- Consistent date formatting: Automatically calculates the Monday of the current week for consistent release note headers
- Encoding safety: Properly handles Unicode characters in commit messages from international contributors
- Atomic file operations: Uses temporary files during processing to prevent corruption if the process is interrupted
GitHub Actions: Orchestrating the Automation
Our workflow ties everything together with robust automation that handles the complexities of CI/CD environments.
Workflow Triggers and Inputs
name: Weekly Release Notes Update
on:
workflow_dispatch:
inputs:
year:
description: 'Year (YYYY) of date to start collecting releases from'
default: '2025'
month:
description: 'Month (MM) of date to start collecting releases from'
default: '01'
day:
description: 'Day (DD) of date to start collecting releases from'
default: '01'
repo_filters:
description: 'JSON string defining filters for specific repos'
required: false
timeout_seconds:
description: 'Timeout in seconds for API calls'
default: '45'
Flexible triggering options:
- Manual dispatch with granular date control: Separate year, month, day inputs for precise date filtering
- Repository-specific filtering: JSON configuration allows different filtering strategies per repository
- Configurable timeouts: Adjustable API timeout settings for different network conditions
Secure Authentication Flow
- uses: actions/create-github-app-token@v2
id: app-token
with:
app-id: <YOUR-APP-ID>
private-key: ${{ secrets.GHA_DOCS_PRIVATE_KEY }}
owner: ascend-io
repositories: ascend-docs,ascend-core,ascend-ui,ascend-backend
Security best practices:
- GitHub App with specific repository access: Explicitly lists only the repositories that need access
- Scoped permissions: App configured with minimal necessary permissions for the specific repositories
- Secret management: Private key stored securely in GitHub Secrets
Repository Configuration Processing
- name: Prepare repository filter configuration
run: |
CONFIG_FILE=$(mktemp)
echo "{}" > "$CONFIG_FILE"
if [ -n "${{ github.event.inputs.repo_filters }}" ]; then
echo '${{ github.event.inputs.repo_filters }}' > "$CONFIG_FILE"
else
DATE_STRING="${{ github.event.inputs.year }}-${{ github.event.inputs.month }}-${{ github.event.inputs.day }}"
jq -r '.[]' bin/release_notes/input_repos.json | while read -r REPO; do
FILTER="since:$DATE_STRING"
jq --arg repo "$REPO" --arg filter "$FILTER" '. + {($repo): $filter}' "$CONFIG_FILE" > "${CONFIG_FILE}.tmp" && mv "${CONFIG_FILE}.tmp" "$CONFIG_FILE"
done
fi
CONFIG_JSON=$(cat "$CONFIG_FILE")
echo "config_json<<EOF" >> $GITHUB_OUTPUT
echo "$CONFIG_JSON" >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT
Data Processing and File Management
- name: Generate release notes
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
GITHUB_TOKEN: ${{ steps.app-token.outputs.token }}
run: |
CONFIG_JSON='${{ steps.repo_config.outputs.config_json }}'
CONFIG_FILE=$(mktemp)
echo "$CONFIG_JSON" > "$CONFIG_FILE"
RAW_OUTPUT=$(python bin/release_notes/generate_release_notes.py \
--repo-config-string "$(cat "$CONFIG_FILE")" \
--timeout "${{ github.event.inputs.timeout_seconds }}")
# Split summary and commits using delimiter
SUMMARY=$(echo "$RAW_OUTPUT" | sed -n '1,/^### END SUMMARY ###$/p' | sed '$d')
MONDAY_DATE=$(echo "$SUMMARY" | head -n1 | grep -oE "[0-9]{4}-[0-9]{2}-[0-9]{2}")
echo "monday_date=$MONDAY_DATE" >> $GITHUB_OUTPUT
echo 'summary<<EOF' >> $GITHUB_OUTPUT
echo "$SUMMARY" >> $GITHUB_OUTPUT
echo 'EOF' >> $GITHUB_OUTPUT
Key implementation lessons:
- Temporary file strategy: We learned the hard way that GitHub Actions environments can lose data between steps. Writing to temporary files solved reliability issues where data would appear blank in subsequent steps.
- Complex JSON handling: Uses `
jq
` for safe JSON manipulation and temporary files to avoid shell quoting issues with complex JSON strings
- Output parsing: Logic to split AI-generated summaries from raw commit data using delimiter markers
- Robust error handling: `
set -euo pipefail
` ensures the script fails fast on any error, preventing silent failures
File Integration and Pull Request Creation
- name: Update whats-new.mdx with release notes
run: |
FILE="website/docs/whats-new.mdx"
BRANCH_NAME="notes-${{ steps.generate_notes.outputs.monday_date }}"
git branch $BRANCH_NAME main
git switch $BRANCH_NAME
TEMP_SUMMARY_FILE=$(mktemp)
echo '${{ steps.generate_notes.outputs.summary }}' > "$TEMP_SUMMARY_FILE"
cat "$TEMP_SUMMARY_FILE" "$FILE" > "${FILE}.new"
mv "${FILE}.new" "$FILE"
rm -f "$TEMP_SUMMARY_FILE"
File management features:
- Atomic file operations: Uses temporary files and atomic moves to prevent file corruption
- Branch management: Creates date-based branches for organized PR tracking
- Content preservation: Carefully prepends new content while preserving existing documentation structure
Lessons Learned and Best Practices
Building this pipeline taught us valuable lessons about documentation automation that go beyond the technical implementation.
Technical Insights
File persistence matters in CI/CD environments. GitHub Actions environments can be unpredictable—always write important data to files rather than relying on environment variables or memory. We learned this the hard way when release notes would mysteriously appear blank in PRs.
API reliability requires defensive programming. Build retry logic and fallbacks for external API calls (OpenAI, GitHub). Network issues and rate limits are inevitable, especially as your usage scales.
Prompt engineering is crucial for consistent output. Spend time crafting prompts that consistently produce the format and tone you want. Small changes in wording can dramatically affect AI output quality and consistency.
Human review is essential, even with AI generation. Having team members review PRs catches edge cases, ensures quality, and builds confidence in the automated system. The goal isn't to eliminate human oversight—it's to make it more efficient and focused.
Historical tracking and product evolution insights. Automated generation creates a consistent record of product evolution that's valuable for retrospectives, planning, and onboarding new team members.
Results and Impact
The automation has fundamentally transformed our release process and team dynamics:
Quantifiable Improvements
Dramatic time savings: Reduced release note creation from 2-3 hours of writing time to 15 minutes of review time. That's a 90% reduction in effort while improving quality and consistency.
Perfect consistency: Every release now has properly formatted, comprehensive notes. No more missed releases or inconsistent formatting across different team members.
Increased frequency: We can now generate release notes weekly, providing users with more timely updates about product improvements.
Complete coverage: Captures changes across all repositories without manual coordination, eliminating the risk of missing important updates.
Next Steps and Future Enhancements
We're continuously improving the pipeline based on team feedback and evolving needs:
Immediate Roadmap
Slack integration: Building a Slackbot to automatically share release notes with our community channels, extending the reach beyond just documentation updates.
Repository tracing: Categorize the raw commits by repository and add links so it's easy to (literally) double-click into each PR for additional context.
Future Possibilities
Multi-language support: Generating release notes in different languages for global audiences as we expand internationally.
Ready to automate your own release notes? Start with the requirements above and build incrementally. Begin with a single repository, get the basic workflow running, then expand to multiple repos and add advanced features. Your future self (and your team) will thank you for eliminating this manual drudgery and creating a more consistent, professional release process.
The investment in automation pays dividends immediately—not just in time saved, but in the improved quality and consistency of your user communication. In a time where software moves fast, automated release notes ensure your documentation keeps pace.