Skip to main content

Coalesce Refresh Scheduling with Azure Data Factory

Azure Data Factory is a cloud-based data integration service that allows creating and orchestrating data movement and transformation workflows at scale. Data analysts and engineers can visually construct pipelines extracting data from disparate siloed sources, applying data cleansing and enrichment flows leveraging services like Fivetran and publishing to centralized data warehouses and lakehouses.

Built-in triggers coupled with enterprise grade reliability, availability, and scalability has made Azure Data Factory a popular choice for realizing end-to-end orchestration of data pipelines. When working with a column-aware data transformation solution like Coalesce on Azure, Data Factory’s scheduling and automation capabilities help deploy existing data transformation pipelines amongst a feature rich logging and orchestration UI to ensure a trustworthy data ecosystem.

In this article we will explore setting up a basic Coalesce pipeline Refresh Trigger using the Azure Data Factory web UI.

Set up a Coalesce Environment

In order to execute jobs within Coalesce, we will have to create an Environment where our job will be deployed to, separate from our development Workspace. To do this, follow these steps.

  1. Go to Build > Build Settings > Environments >New Environment.

    The image shows a user interface of a web application named Coalesce, specifically in the Environments section under Build Settings. The left panel contains a search bar and a list of job nodes, while the main panel displays environment details and options. Key elements highlighted in red boxes include the Build button, Build Settings tab, Environments menu item, and the New Environment button.
  2. Specify the necessary information in each of the tabs to create your Environment. Review Environments to learn more.

  3. Once your environment is set up, navigate to the Deploy tab in the top menu.

  4. Select Deploy next to the Environment that you created,

  5. Select the branch from the Workspace or job that you wish to deploy into your new Environment. We recommend deploying the latest commit of your main branch into the production Workspace but you can choose any branch or job that you wish to deploy.

  6. Once you’ve deployed into your selected Environment, click Generate access token and save it somewhere; you will need it to authenticate the API call made from Azure Data Factory.

Generate access token in Deploy tab

Set up KeyPair Auth in Snowflake

Although the Coalesce API supports Snowflake Username/Password authentication, you should not use it for automated pipeline scheduling in order to conform to security best practices. Instead, we will configure KeyPair Authentication in Snowflake and format the key so it can be used as a secret in ADF. Save the private key generated from this process somewhere safe, as it be need in subsequent sections.

Configure the Azure Key Vault

In order to store and use the credentials that we need in order to authorize requests from ADF to Coalesce/Snowflake, you need to store the credentials (access tokens, keys, etc.) securely within the Azure Key Vault, and then link the Azure Key Vault to our Data Factory.

Even though you can store secrets less securely at the ADF pipeline level, this is not recommended as compared to the secure flow that is detailed below.

Azure Administrator

For the next set of steps, you will have to be an Azure Administrator to have the necessary permissions to create key vaults, secrets, and edit permissions.

  1. Create a key vault by logging into the Azure portal and selecting Key vaults from the Azure services.

    The image shows the Microsoft Azure dashboard with various Azure services and resources. Key vaults are highlighted under Azure services. The Resources section lists recent items, including data factories, key vaults, and resource groups, along with their last viewed times.
  2. Create the Key vault by selecting your subscription and proceeding through the menus. In this example, our Key vault is called coalesceCreds.

    The image shows the Create a key vault page in the Azure portal, detailing the basics for setting up a key vault. It includes fields for subscription, resource group, key vault name, region, and pricing tier. Additionally, it outlines options for soft-delete and purge protection settings.
  3. Go to the new Key vault and go to Object >Secrets.

    The image shows the Microsoft Azure interface for a key vault named coalesceCreds, focusing on the Secrets section. The left-hand menu lists various options, including Overview, Activity log, Access control (IAM), Tags, and Secrets. The Secrets section is highlighted, indicating it is currently selected.
  4. Select Generate/Import.

    The image shows a toolbar from the Microsoft Azure interface for managing secrets. The toolbar includes options such as Generate/Import, Refresh, Restore Backup, View sample code, and Manage deleted secrets. The Generate/Import option is highlighted.
  5. Add two secrets:

    The image shows a section of the Microsoft Azure interface for managing secrets. A notification indicates that the secret snowflakeKeyPairKey has been successfully created. Below the notification, a table lists secrets with their names, types, and statuses, including snowflakeKeyPairKey and COA-API-KEY, both marked as enabled.
    Formatting the KeyPairKey

    The KeyPairKey MUST be inputted in a very specific format, all newlines must be explicitly stated, and all whitespaces removed.

    For example, if your key looks like this:

    Unformatted key. Each line has a new line between them

    It must be inputted like this:

    Formatted key with new lines removed
  6. Go to Access Control (IAM) to configure the access policy for the Key Vault, and click Add Role Assignment.

    The image shows the Access control (IAM) section for the coalesceCreds key vault in the Microsoft Azure portal. It provides options for checking access, managing role assignments, and adding new role assignments. The Access control (IAM) option and the Add role assignment button are highlighted.
  7. Select the Key Vault Secrets User role.

    The image shows the Add role assignment page in the Microsoft Azure portal for a key vault. The search results list various roles related to key vaults, with the Key Vault Secrets User role highlighted. The page is divided into tabs for Role, Members, Conditions, and Review + assign, indicating steps for assigning roles.
  8. Click Next, then Select Managed Identity” on the next screen and select the name of the Data Factory, in this case, our data factory is called “JJ-Data-Factory-Coalesce”.

    The image shows the Add role assignment page in the Microsoft Azure portal, focusing on assigning the Key Vault Secrets User role. The page is set to assign access to a managed identity, with the selection pane on the right listing available managed identities. The selected member, JJ-Data-Factory-Coalesce, is highlighted at the bottom of the selection pane.
  9. Click Review + assign.

  10. Navigate to the Azure Data Factory and link the key vault to the data factory by going to Linked Services > New.

    The image shows a section of the Microsoft Azure interface with a focus on the menu options under the Connections category. The Linked services option is highlighted. Other visible categories include General, Source control, and Author.
  11. Create the Linked Service.

The image shows the New linked service configuration page in the Microsoft Azure portal for an Azure Key Vault. The fields include the name of the linked service, the Azure subscription, and the key vault name, with an authentication method set to System Assigned Managed Identity. The managed identity name and object ID are displayed, with options to test the connection and add annotations.

Using Azure Key Vault Secrets in Pipeline Activities

Follow the steps outlined in Microsoft's Use Azure Key Vault secrets in pipeline activities guide to add the secrets to our ADF access policy and import in our secrets and use them in our pipeline.

  1. Create a Web Activity type to extract the Access token from the Key vault.
  2. Repeat the steps to add a web Activity type that retrieves the Snowflake Key Pair credential. Both activities must have the Secure output box checked.
The image shows two web activities in a Microsoft Azure pipeline. The first activity is labeled Get Access Token, and the second is labeled Get Snowflake Key Pair Key, both marked with checkmarks indicating successful completion. Each activity includes an icon representing web operations. The image shows the configuration settings for a web activity named Get Snowflake Key Pair Key in a Microsoft Azure pipeline. The settings include activity state, timeout, retry, and retry interval. The Secure output option is checked, indicating that the output of this activity will be handled securely.

Create the Coalesce Refresh Activity

You can now begin assembling the Web Activity that will trigger a refresh of the Coalesce environment of our choosing. For this, you'll use the Coalesce API to issue a startRun call to trigger the pipeline. See Step 4 for what it should look like at the end.

  1. Create a new Web Activity, name it Coalesce Refresh Trigger, and drag lines from the other Web Activity modules to it.

    The image shows a sequence of web activities in a Microsoft Azure pipeline. The activities include Get Access Token, Get Snowflake Key Pair Key, and Coalesce Refresh Trigger. Arrows indicate the flow of the process, with all activities marked as successfully completed.
  2. To input the Key Pair Key into the Coalesce Refresh Action Trigger and API body, click on Body to bring up the Pipeline Expression Builder for that input. Input the body of the Coalesce startRun endpoint that you configured in the API documentation.

    The image shows the Pipeline expression builder in the Microsoft Azure portal. It includes a JSON snippet defining run details, user credentials, and a Snowflake key pair key. The key pair key's value is dynamically obtained from the output of the Get Snowflake Key Pair Key activity.
    API Body
    "runDetails": {"parallelism":16, "environmentID": "2"},  
    "userCredentials": {"snowflakeAuthType": "KeyPair",
    "snowflakeKeyPairKey":"@{activity( 'Get Snowflake Key Pair
    Key'). output. value}"}}
    //Remember to update the body text for your use.
  3. To input the Coalesce Access Token, navigate to the Authorization header and click the Value input to bring up the Pipeline Expression Builder and input the following expression.

    The image shows the Pipeline expression builder in the Microsoft Azure portal. It contains a dynamic content expression to include a Bearer token retrieved from the output of the Get Access Token activity. The expression syntax ensures secure handling of the access token.
    Authorization
    Bearer @activity( 'Get Access Token'). output.value}
  4. Your final configuration should match the following image. Enter your apps URL, the method (POST), and the Headers.

    The image shows the settings for the Coalesce Refresh Trigger web activity in a Microsoft Azure pipeline. It details the connection type, URL, method (POST), and body content, which includes dynamic content expressions. The headers section specifies content-type and authorization, with the authorization header dynamically fetching the Bearer token from the Get Access Token activity.
  5. Click OK to save the changes.

  6. Check the Secure input box under General to ensure that none of the secure credentials are exposed.

    The image displays the configuration settings for a web activity named Coalesce Refresh Trigger in a Microsoft Azure pipeline. The settings include activity state, timeout, retry, and retry interval. The Secure input option is checked, indicating that the input of this activity will be handled securely.

Scheduling Your Coalesce Trigger Pipeline

You're going to schedule your ADF pipeline.

  1. Go to the Edit tab in Azure Data Factory. Click Add Trigger > New/Edit.

    The image shows the toolbar of a Microsoft Azure Data Factory interface. The Add trigger option is highlighted, with a dropdown menu displaying Trigger now and New/Edit options. This section allows users to add and manage triggers for data factory pipelines.
  2. Input the desired cadence and times that you want to trigger the refresh for, or select from any previously selected cadences. See the example below.

    The image shows the New trigger configuration page in Microsoft Azure Data Factory. The settings include the trigger name, type (schedule), start date, time zone, recurrence, and advanced recurrence options. The trigger is set to execute at 12:30 PM every day, with an end date specified and the option to start the trigger on creation checked.
  3. Click Ok.

You have now successfully created and scheduled a Coalesce Environment Refresh with Azure Data Factory.