Skip to main content

Understanding Deploy, Refresh, and Jobs in Coalesce

This guide goes over the Deploy, Refresh, and Jobs concepts in Coalesce.

Deploy and Create

When you deploy your pipeline, you're executing the Data Definition Language (DDL) operations that establish the physical database structures needed for your transformation logic.

During a deployment:

  • Creates or modifies database tables and columns.
  • Executes Data Definition Language (DDL) statements.
  • Performs ALTER, CREATE, and DELETE operations on your database objects.

When to Deploy

  • You've created a new data pipeline.
  • You've made structural changes to your existing pipeline.
  • You need to add new tables or columns to support additional data.

Refresh and Run

A refresh operation executes the transformations defined in your pipeline, taking data from source tables and applying your business logic.

During a refresh:

  • Runs defined data transformations
  • Executes Data Manipulation Language (DML) statements
  • Performs MERGE, INSERT, UPDATE, and TRUNCATE operations

When to Refresh

  • New data has arrived in your source systems.
  • You want to update existing data with the latest transformations.

Jobs

Jobs are how you organize and manage your refresh operations. Rather than refreshing your entire pipeline every time, jobs allow you to refresh specific parts of your pipeline based on your needs.

A job in Coalesce is a defined subset of nodes (transformation steps) that can be run together. Jobs are created using selector queries that identify which parts of your pipeline to include or exclude.

Types of Jobs

Coalesce offers three main ways to refresh your data:

  • Predefined Jobs: Created and saved in the Coalesce app with specific IDs, these can be scheduled and run repeatedly.
  • Ad-Hoc Jobs: One-time jobs defined using include/exclude selectors, useful for testing or specific data processing needs.
  • Full Pipeline Refresh: Refreshes all nodes in your pipeline, typically used for initial loads or complete refreshes.
Deploy Before Refresh

You must deploy before you can refresh—you need containers before you can fill them with data.

Ways to Deploy and Refresh in Coalesce

  • Coalesce App: Use the web interface for manual deploys and refreshes.
  • Coalesce Scheduler: Schedule jobs directly within the Coalesce app.
  • Command Line Interface (CLI): Automate deploys and refreshes with command-line tools.
  • API: Integrate with other systems using Coalesce's REST API.
  • Third-Party Tools: Connect with orchestration tools like Apache Airflow, Azure Data Factory, or GitHub Actions.

Best Practices

  • Create targeted jobs: Break your pipeline into logical jobs that can be run independently.
  • Schedule wisely: Consider data dependencies and processing windows when scheduling jobs.
  • Use parameters: Leverage parameters to make your pipeline flexible across environments.

What's Next?

  • Create an Environment - Step by step instructions on setting up your Environment for deployments.
  • Coalesce Scheduler - Learn how to scheduler Jobs right in Coalesce.
  • Deployment - Go over deployment and requirements before deploying in Coalesce.
  • Refresh - Learn how to create Jobs and refresh your Environment.