Apache Airflow 2.10 Arrives to Usher in a New Era of AI Data Orchestration

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn more

Getting data from where it’s created to where it’s effectively used for data analytics and AI isn’t always a straight line. It’s the role of data orchestration technologies like the open source Apache Airflow project to help build a data pipeline that gets data where it needs to be.

Today, the Apache Airflow The project is about to release its 2.10 update, marking the first major update to the project since Airflow 2.9 Released Last April, Airflow 2.10 introduced hybrid execution, enabling organizations to optimize resource allocation across diverse workloads, from simple SQL queries to compute-intensive machine learning (ML) tasks. Enhanced lineage capabilities provide greater visibility into data flows, critical for governance and compliance.

To go further, Astronomerthe leading commercial vendor behind Apache Airflow is updating its Astro platform to incorporate open source dbt-core (Data Build Tool) technology, unifying data orchestration and transformation workflows on a single platform.

These enhancements aim to streamline data operations and bridge the gap between traditional data flows and emerging AI applications. The updates provide enterprises with a more flexible approach to data orchestration, addressing the challenges of managing diverse data environments and AI processes.

“If you think about why you adopt orchestration in the first place, it’s because you want to coordinate things across the entire data supply chain, you want that central pane of glass,” said Julian LaNeve, CTO of Astronomer, told VentureBeat.

How Airflow 2.10 Improves Data Orchestration with Hybrid Execution

One of the big updates in Airflow 2.10 is the introduction of a feature called hybrid execution.

Prior to this update, Airflow users had to select a single execution mode for their entire deployment. This deployment could have been choosing a Kubernetes cluster or using Airflow’s Celery runner. Kubernetes is better suited for heavier compute tasks that require more granular control at the per-task level. Celery, on the other hand, is lighter and more efficient for simpler tasks.

However, as LaNeve explains, real-world data pipelines often have a mix of workload types. For example, he noted that in an Airflow deployment, an organization might just need to run a simple SQL query somewhere to get data. A machine learning workflow might also connect to that same data pipeline, which requires a heavier Kubernetes deployment to run. This is now possible with hybrid execution.

The hybrid execution capability differs significantly from previous versions of Airflow, which required users to make a single choice for their entire deployment. Now, they can optimize each component of their data pipeline for the appropriate level of compute and control resources.

“Being able to choose at the pipeline and task level, instead of having everything use the same execution mode, I think opens up a whole new level of flexibility and efficiency for Airflow users,” LaNeve said.

Why Data Lineage in Data Orchestration Matters for AI

Understanding data provenance is the domain of data lineage. It is a critical capability for traditional data analytics as well as emerging AI workloads where organizations need to understand data provenance.

Prior to Airflow 2.10, data lineage tracking had some limitations. LaNeve said that with the new lineage capabilities, Airflow will be able to better capture dependencies and data flow within pipelines, even for custom Python code. This improved lineage tracking is crucial for AI and machine learning workflows, where data quality and provenance are paramount.

“A key element of any next-generation AI application that people are building today is trust,” LaNeve said.

So if an AI system provides an incorrect or unreliable result, users won’t continue to rely on it. Trustworthy lineage information helps solve this problem by providing a clear, verifiable trail that shows how engineers obtained, transformed, and used the data to train the model. Additionally, strong lineage capabilities enable more comprehensive data governance and security controls around sensitive information used in AI applications.

Waiting for Airflow 3.0

“Data governance, security and privacy are becoming more important than ever because you want to make sure you have complete control over how your data is used,” LaNeve said.

While Airflow 2.10 brings several notable improvements, LaNeve is already looking forward to Airflow 3.0.

According to LaNeve, the goal of Airflow 3.0 is to modernize the technology for the era of next-gen AI. The main priorities of Airflow 3.0 are to make the platform more language-agnostic, allowing users to write tasks in any language, as well as making Airflow more data-aware, shifting the focus from process orchestration to data flow management.

“We want to make sure Airflow is the standard for orchestration for the next 10 to 15 years,” he said.

VB Daily

Stay informed! Get the latest news delivered to your inbox every day

By subscribing, you agree to VentureBeat’s Terms of Use. Terms of Use.

Thank you for subscribing. Find out more VB newsletters here.

An error has occurred.

Latest News

AI Crypto Market Cap Rises Over 25% Amid Major Developments in the Sector

Developing ethical use of AI in primary and secondary education

Tottenham’s technological transformation! How AI is shaping the future of football

Trends for 2025 and beyond

New urgent Gmail security warning for billions as attacks continue

AI and Cybersecurity Industry in Middle East and Africa to See Tremendous Success

AI, 5G and Quantum: risks linked to innovation and cybersecurity

Trends for 2025 and beyond

New urgent Gmail security warning for billions as attacks continue

AI and Cybersecurity Industry in Middle East and Africa to See Tremendous Success

AI, 5G and Quantum: risks linked to innovation and cybersecurity

AI Crypto Market Cap Rises Over 25% Amid Major Developments in the Sector

Smarter AI agents are boosting the digital entertainment industry

ChatGPT and AI tools gain ground in the search market

AI is great, but agencies need to remember that in 2025 they will be in marketing

AI Crypto Market Cap Rises Over 25% Amid Major Developments in the Sector

Smarter AI agents are boosting the digital entertainment industry

ChatGPT and AI tools gain ground in the search market

AI is great, but agencies need to remember that in 2025 they will be in marketing

Sriram Krishnan, Donald Trump’s AI chief, fights to remove country caps on green cards; here’s why it’s good news for Indians

7 Google AI announcements from October

Instagram concerned about challenge of distinguishing real images from AI-generated images, Apple to launch foldable iPhone by 2026 and beyond: Consumer Tech News (Dec. 16-20) – Apple (NASDAQ: AAPL), Amazon.com (NASDAQ:AMZN)

AI is bad news for the Global South

Sriram Krishnan, Donald Trump’s AI chief, fights to remove country caps on green cards; here’s why it’s good news for Indians

7 Google AI announcements from October

Instagram concerned about challenge of distinguishing real images from AI-generated images, Apple to launch foldable iPhone by 2026 and beyond: Consumer Tech News (Dec. 16-20) – Apple (NASDAQ: AAPL), Amazon.com (NASDAQ:AMZN)

AI is bad news for the Global South

Machine learning at the Flatiron Institute

Exploring the Power of AI and ML in Smart Grids: Advances, Applications and Challenges

Unsupervised ML 17 — Future Trends in Unsupervised Machine Learning: What’s Next? | by Ayşe Kübra Kuyucu | December 2024

FrontiersMachine learning applications in search of life beyond EarthMachine learning (ML) and artificial intelligence (AI) have moved beyond niche applications to become transformative and essential tools for analyzing data….2 days

Machine learning at the Flatiron Institute

Exploring the Power of AI and ML in Smart Grids: Advances, Applications and Challenges

Unsupervised ML 17 — Future Trends in Unsupervised Machine Learning: What’s Next? | by Ayşe Kübra Kuyucu | December 2024

FrontiersMachine learning applications in search of life beyond EarthMachine learning (ML) and artificial intelligence (AI) have moved beyond niche applications to become transformative and essential tools for analyzing data….2 days

Latest News

Subscribe to Updates

Apache Airflow 2.10 Arrives to Usher in a New Era of AI Data Orchestration

How Airflow 2.10 Improves Data Orchestration with Hybrid Execution

Why Data Lineage in Data Orchestration Matters for AI

Waiting for Airflow 3.0

Subscribe to Updates