Skip to main content

Airflow setup

This guide walks through the Beta pathway where Apache Airflow sends OpenLineage events to Marquez so Catalog consumes DAG lineage for warehouse tables. Marquez acts as the HTTP backend for those events. If you also push curated DAG URLs with the Catalog Public API, read the Catalog Airflow hub first so automation and lineage stay aligned on naming and Workspace boundaries.

Beta Feature

DAG lineage ingestion through Catalog is in Beta. Contact Coalesce Support before you activate the feature or expand environments and table scope.

Configure Airflow

Connect Airflow to Marquez using OpenLineage, then attach the credentials and metadata Catalog expects.

Install or Enable OpenLineage in Airflow

The OpenLineage project documents version-specific configuration. Align your Airflow version with their recommendations:

  • Airflow 2.3 through 2.6: install openlineage-airflow with your dependency management for Airflow workers and schedulers.
  • Airflow 2.7 and later: follow current OpenLineage guidance for bundled or package-based setup.

Consult OpenLineage when you upgrade Airflow versions or rebuild your container images.

Set Environment Variables Airflow Sends to Marquez

Add the variables Marquez exposes for ingestion so each DAG run can post lineage reliably:

  • OPENLINEAGE_URL - Base URL for the lineage HTTP backend, your Marquez instance.
  • OPENLINEAGE_API_KEY - API key Airflow presents so Marquez accepts lineage payloads.
  • OPENLINEAGE_NAMESPACE - Namespace that identifies your Airflow instance in lineage graphs.

Tune values with guidance from OpenLineage, Marquez, and your infrastructure team until test DAG runs send events without errors.

Share Credentials Catalog Needs From Marquez

Prepare the following details for onboarding with Catalog and Support:

  • Marquez API URL - Mirrors the effective OPENLINEAGE_URL or a comparable API entry point Catalog should call.
  • Marquez API key - Mirrors OPENLINEAGE_API_KEY or a credential Catalog accepts for read access.
  • OPENLINEAGE_NAMESPACE - The namespace configured on Airflow so Catalog correlates lineage with the correct cluster of DAG identifiers.
  • Warehouse namespace Catalog expects - The namespace OpenLineage uses for your warehouse source so lineage maps onto the right warehouse Source > Database > Schema assets.

Keep secrets in secured channels when you transmit them during Beta onboarding.

Need a Marquez Server

If Marquez is not deployed in your environment yet, contact Coalesce Support for options before configuring Airflow emission.

Lineage ingestion does not replace automation that pushes External Links through the Catalog Public API unless you consolidate responsibilities. Decide whether curated URLs or lineage-derived shortcuts are authoritative for collaborators, especially when names or namespace conventions might diverge across systems.

Read Marquez and OpenLineage Guidance When Debugging

Upstream projects publish the canonical steps for backends, extractor configuration, and release-specific behavior. Supplement this Catalog page when you troubleshoot connectivity or namespace setup in your environment:

  • Use Marquez for foundational terminology and server URL configuration that Airflow targets.
  • Use OpenLineage for Airflow-focused steps that validate lineage emission before Catalog consumes events.

What's Next?

  • Return to Catalog Airflow hub when you weigh API-managed links against lineage ingestion or plan both together.
  • Read Links on warehouse tables for collaborator-facing External Link behavior Catalog shows after API updates.
  • Use Catalog Public API whenever automation drives table link metadata programmatically alongside lineage.