Refreshing Your Data Pipeline Jobs

Learn how to refresh your Environment and run Jobs.

What Is Refresh?

Refresh is responsible for running the data transformations defined in your data warehouse metadata. This typically involves DML (Data Manipulation Language) SQL statements such as MERGE, INSERT, UPDATE, and TRUNCATE which will perform transformations on the actual data. Use refresh when you want to update your pipeline with any new changes from your data warehouse. To only refresh a subset of data, use Jobs.

What Are Jobs?

Jobs are a subset of nodes, created by the selector query, that are run during a refresh.

Before You Begin

You'll need to get an authentication token to run refreshes. They can run using the Trigger Job to Run API or using refresh with the CLI. Review the steps in Connecting to the API.

📘

Deploy Before Refreshing

You can only refresh if you've deployed your pipeline.

Step 1: Create a Job

  1. Go to Jobs in the Build sidebar.
  2. Select the + sign to create a new Job.
  3. Select Edit to add Include and Exclude Selectors.
  4. Take note of your Job ID. In this example it's jobID: 3. You'll need the Job ID to run it as part of the refresh.


Step 2: Commit Your Job

Jobs need to be committed into git and deployed to an environment before they can be used. You can read more about making commits in our Git Integration article.


Step 3: Configure Your Environment

Go to Build Settings > Environments and check that the environment you want to refresh is configured. It should have:

Step 4: Run Your Jobs

Use the API or CLI to run a Job.

API

Jobs can be triggered with the Start Job endpoint. Trigger Job to Run. By only passing the environmentID and leaving the jobID out, you can refresh the entire environment. You can also use the excludeNodesSelector and includeNodesSelector to override the Jobs created. To avoid setting the selectors manually each run, we recommend using Jobs to save and manage nodes.

//Run an existing job for the environmet
curl --request POST \
     --url https://app.coalescesoftware.io/scheduler/startRun \
     --header 'Authorization: Bearer YOUR-API-TOKEN' \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data '
{
  "runDetails": {
    "parallelism": 16,
    "jobID": "4",
    "environmentID": "10"
  },
  "userCredentials": {
    "snowflakeAuthType": "Basic",
    "snowflakeRole": "ACCOUNTADMIN",
    "snowflakeWarehouse": "COMPUTE_WH",
    "snowflakeUsername": "SOMEUSER",
    "snowflakePassword": "SOMEPASS"
  }
}
'
//You can also the `excludeNodesSelector` and `includeNodesSelector` to run a one-time Job. Leave out the `jobID`
curl --request POST \
     --url https://app.coalescesoftware.io/scheduler/startRun \
     --header 'Authorization: Bearer YOUR-API-TOKEN' \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data '
{
  "runDetails": {
    "parallelism": 16,
    "includeNodesSelector": "{ location: SAMPLE name: CUSTOMER } OR { location: SAMPLE name: LINEITEM } OR { location: SAMPLE name: NATION } OR { location: SAMPLE name: ORDERS } OR { location: SAMPLE name: PART } OR { location: SAMPLE name: PARTSUPP } OR { location: SAMPLE name: REGION } OR { location: SAMPLE name: SUPPLIER } OR { location: QA name: STG_PARTSUPP } OR { location: PROD name: STG_PARTSUPP }",
    "environmentID": "10"
  },
  "userCredentials": {
    "snowflakeAuthType": "Basic",
    "snowflakeRole": "ACCOUNTADMIN",
    "snowflakeWarehouse": "COMPUTE_WH",
    "snowflakeUsername": "SOMEUSER",
    "snowflakePassword": "SOMEPASS"
  }
}
//Leave out the `jobID` and selectors to refresh an entire environment
curl --request POST \
     --url https://app.coalescesoftware.io/scheduler/startRun \
     --header 'Authorization: Bearer YOUR-API-TOKEN' \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data '
{
  "runDetails": {
    "parallelism": 16,
    "environmentID": "10"
  },
  "userCredentials": {
    "snowflakeAuthType": "Basic",
    "snowflakeRole": "ACCOUNTADMIN",
    "snowflakeWarehouse": "COMPUTE_WH",
    "snowflakeUsername": "SOMEUSER",
    "snowflakePassword": "SOMEPASS"
  }
}
'

CLI

Refresh jobs can be triggered using our CLI tool coa using coa refresh. Learn more in our CLI Commands documentation. By only passing the environmentID and leaving the jobID out, you can refresh the entire environment. You can also use the excludeNodesSelector and includeNodesSelector to override the Jobs created. To avoid setting the selectors manually each run, we recommend using Jobs to save and manage nodes.


📘

CLI Set Up

Make sure to setup your CLI. Review CLI Setup.


//This example assumes you are using a `coa` config file.
  coa refresh --environmentID 1  --jobID 4
//You can also the `excludeNodesSelector` and `includeNodesSelector` to run a one-time Job. Leave out the `jobID`.
coa refresh --environmentID 1 --include '{ location: SAMPLE name: CUSTOMER } OR { location: SAMPLE name: LINEITEM } OR { location: SAMPLE name: NATION } OR { location: SAMPLE name: ORDERS } OR { location: SAMPLE name: PART } OR { location: SAMPLE name: PARTSUPP } OR { location: SAMPLE name: REGION } OR { location: SAMPLE name: SUPPLIER } OR { location: QA name: STG_PARTSUPP } OR { location: PROD name: STG_PARTSUPP }'
 //Leave out the `jobID` and selectors to refresh an entire environment
coa refresh --environmentID 1 

Edit Jobs

  • Jobs can also be modified by dragging and dropping Nodes or Subgraphs into the include/exclude text boxes, while on the Graph, Node Grid, or Column Grid of a Job.
  • Nodes can only be removed from the Job by modifying the include/exclude query.
  • Right click each Job for more options.

Refresh Status

You can review the refresh status by:

  • Going to the Build page and clicking on the refresh.

  • If you're using the CLI, adding the --out flag to print the results in JSON format.
  • Using the API, use Get Job Status, with the runID. The runID is returned when the job is run.

Job Scheduling

Jobs can be run at set time intervals using a scheduler. See our article on Scheduling for details and examples.