Quick Start Guide

In this example, you’ll learn to connect your database, set up a staging node with a data transformation, and create a dimension and fact table.

Before You Start

  • Sign up for a Coalese account.
  • Make sure you're using Google's Chrome browser, as other browsers are not officially supported.
  • Have your Snowflake login available. Don’t have a Snowflake account yet or not ready to connect your company’s Snowflake to Coalesce? Sign up for a free trial and use Snowflake’s provided sample data.

Video Overview

The following video provides a brief overview of what this guide covers.

Interface

When you first sign in on a new account, you'll be presented with the Projects Dashboard, that includes a default Project and Development Workspace. Click the Launch button to open the workspace to continue. You can ignore the version control warning since it won't be used in this guide.

Projects Dashboard

Projects Dashboard

In the Development Workspace is the Build interface and Deploy interface.

The Build interface is where you'll create nodes, build node graphs, and transform your data. The Deploy interface is used to push your pipeline to other environments such as QA and production.

📘

The Problem Scanner

For new accounts, the Problem Scanner will show a few action items. You can disregard them for this guide.

The Problem Scanner

The Problem Scanner

Connect to Snowflake

Add your Snowflake credentials in the build settings

Add your Snowflake credentials in the build settings

  1. Click on Build Settings, which is represented by a cogwheel icon.

  2. Go to Development Workspaces and edit your current Workspace by clicking the pencil icon to the right of it.

  3. On the Settings page, enter your Snowflake Account.

    1. Obtain your Snowflake URL, by opening the account selector in Snowflake.

  4. Go to User Credentials and fill out the form with your Snowflake login.

  5. Click Test Connection to ensure your credentials work as expected.

  6. Click Save.

You've now connected Coalesce to your Snowflake instance!

Configure Storage Locations and Workspace

A storage location is a logical name you provide to represent a database and schema (in Snowflake) and you will need them to make use of the Workspace you configured in Connect to Snowflake. Depending on your setup you may need to create a storage location in Snowflake.

  1. Go to Build Settings >Storage Mappings. You'll have a SAMPLE and WORK locations. The Snowflake Sample Data will be mapped to SAMPLE and the WORK is where you'll write new tables.
  2. Set the SAMPLE database to use the Snowflake sample data, and use the schema TPCH_SF1.
  3. Set the WORK to a second storage location and schema in that storage location.
Example Configuration of Storage Mappings

Example configuration of Storage Mappings

📘

Snowflake Storage Locations

If you don't have two storage locations, you'll need to create them. You can do this from Snowsight > Databases or using SnowQL. You can also get the sample data from Snowflake.

Build a Transformation Pipeline

Now it's time to add Sources to the graph. The DAG editor is where you'll configure Nodes that will transform your data. Below is an example of a graph with several nodes. Each box is considered a Node.

Example Graph

Example graph with multiple nodes

Add Graph Sources

Add source to SQL Pipeline

Add source to SQL Pipeline

  1. Expand Nodes from the sidebar.
  2. Click on the + next to Search > Add Sources.
  3. Choose your source tables and click Add Sources.
  4. You'll now see your graph populated with some Source Nodes.

Make a Stage Node to Transform Your Data

Now that you have Source Nodes on your graph, it's time to add a Stage Node. Stage Nodes are intermediate nodes in the graph where you can preview your changes.

  1. Add one or more Stage Nodes by right clicking your Nation Source Node>Add Node>Stage Node. You can select multiple Source Nodes by Shift+clicking them and then add multiple Stage Nodes simultaneously.
  2. Double click on the Stage Node or right-click and select Edit to open up the Node Editor for the Nation Stage Node you created.
  3. Open Node Properties on the right and ensure the Storage Location is set to the WORK you configured earlier. You'll be writing to this location in Snowflake.
  4. Click Create to create a table in Snowflake. You'll see the status in the lower window.
  5. Click Run to populate the table. You'll see the status in the lower window. You haven't transformed the data yet.
  6. Edit the Transform field in the Mapping grid by double clicking in the transform field of the N_NAME column. Try a simple transform like LOWER() and the name of your column, or you can use the syntax LOWER("NATION"."N_NAME") to edit the Snowflake sample data.
  7. Click Run again to transform the data. You'll see that the nation names are in lowercase.

📘

On Transforms

Any Snowflake SQL transform can be used to transform your data.

Example of Transformed Data.  The entries in N_NAME are now lowercase.

Example of Transformed Data. The entries in N_NAME are now lowercase.

Congratulations! You've connected your database and applied a basic transformation to your data. Feel free to continue experimenting with some of the other node types below.

Create a Dimension Table

In this guide we will be making a Type 2 Dimension to track historical data of a column. A slowly changing dimension(Type 2) is a industry standard for tracking historical data by creating multiple records for a given natural key.

By default, Coalesce creates a Type 1 Dimension, you'll add a Type 2 Dimension.

  1. Right click on the CUSTOMER node and create a new Stage Node.
  2. In the Node editor for STG_CUSTOMER, click Create and then Run.
  3. Return to the main graph and create a Dimension node from the STG_CUSTOMER node.
A customer Stage node, then a Dimension node

A customer Stage node, then a Dimension node

  1. Go into the new DIM_CUSTOMER Node editor.
  2. Open up Options and select C_CUSTKEY as a business key by selecting it and clicking the right arrow.
  3. In the same Options area, go to Change Tracking and select C_ADDRESS and C_PHONE, then click the right arrow to add them.
  4. Now Create and Run the DIM_CUSTOMER node. Congratulations! You've created a Type 2 Dimension table.
1437

Choosing C_CUSTKEY as the Business Key

📘

Type 1 vs Type 2

In the Dimension Node, if no Change Tracking columns are selected, the node will act as a Type 1 Dimension. If Change Tracking columns are selected, it will act as a Type 2.

Create a Fact Table

Now let's create a fact table. You're joining the Orders (STG_ORDERS) with the Dimension table on the DIM_CUSTOMER_KEY and ORDERS Customer key. Then you'll create a Fact table of the customers and orders. Because you're specifying the O_ORDERKEY, it will merge instead of inserting the data

  1. Create a new Stage Node from the ORDERS Source Node.
  2. Open the new STG_ORDERS node and delete all the columns except for O_ORDERKEY, O_CUSTKEY, and O_TOTALPRICE. You can drag and drop rows to group the items for deletion.
  3. Select the DIM_CUSTOMER node on the left side, then select DIM_CUSTOMER_KEY and drag it into your STG_ORDERS mapping grid.
`DIM_CUSTOMER_KEY` added to the Mapping grid

DIM_CUSTOMER_KEY added to the Mapping grid

  1. Go to Join, next to Mapping in the STG_ORDERS Node Editor.
  2. Delete any existing text.
  3. Click Generate Join and then Copy to Editor.
  4. Replace the /_COLUMN_/ text with O_CUSTKEY. It should look similar to the following:
    1. FROM {{ ref('SAMPLE', 'ORDERS') }} "ORDERS" LEFT JOIN {{ ref('WORK', 'DIM_CUSTOMER') }} "DIM_CUSTOMER" ON "ORDERS"."O_CUSTKEY" = "DIM_CUSTOMER"."C_CUSTKEY"
  5. Create and Run the STG_ORDERS node.
  6. Create a Fact Node from STG_ORDERS.
  7. Open the new FCT_ORDERS Node Editor.
  8. Open Options > Business Key > add the O_ORDERKEY
  9. Create and Run the FCT_ORDERS node.

Congratulations! You have now made a fact table! You can also run this query in Snowflake. Make sure to adjust your schema(MY_SCHEMA) and database(MY_DB) to your enviroment.

select DIM.C_NAME CUSTOMER_NAME,
sum(FCT.O_TOTALPRICE) TOTAL_PRICE
from "MY_DB"."MY_SCHEMA"."FCT_ORDERS" FCT
inner join "MY_DB"."MY_SCHEMA"."DIM_CUSTOMER" DIM
on FCT.DIM_CUSTOMER_KEY = DIM.DIM_CUSTOMER_KEY
group by DIM.C_NAME;

What’s Next

Now that you've finished the Quick Start Guide, you can try using other node types, backup your work using our git integration, or try deploying your pipeline to an environment.