Out-of-the-Box Node Types
Overview
Node types control the actual creation and data manipulation of all database objects within your data pipeline. Node types are defined by Jinja/SQL templates and a YAML Node definition which has numerous options for customization in the Node Type Editor. Templates are split into Create and Run. Create templates define the structure of a table such as columns and data types while Run templates define how the data is to be loaded into the table after creation. Information on the node definition configuration options can be found in the Node Type Editor section.
Source Nodes
Source nodes represent the tables, external tables, or views where raw data is queried to be processed by subsequent nodes. They are not manually editable other than to change their location or add a description.
Stage Nodes
Stage nodes are intermediate nodes in the graph where you prepare the data by applying business logic.
By default, stage nodes truncate data before every run and always contain only the current batch of data. These nodes can be materialized into either a table or a view, but default to a table for easier troubleshooting.
Tip
Divide and conquer complex business logic by splitting it into multiple Stage Nodes.
Persistent Stage Nodes
Persistent stage nodes are similar to Stage Nodes, except that they:
- Do not truncate the data by default
- Persist data across multiple runs
- Support tracking change history of columns based on business key
Tip
Use Persistent Stage nodes when you want to keep the raw data for extended periods rather than just for the current batch load.
Fact Nodes
Fact nodes represent Coalesce's implementation of a Kimball Fact Table. A fact table consists of the measurements, metrics, or facts of a business process and is typically located at the center of a star schema or a snowflake schema surrounded by dimension tables.
A fact table can be configured to use a business key to MERGE data at the appropriate grain of the data. Alternatively, if no business key is selected, the Fact Node will use an INSERT statement to populate the data.
You can read more about Fact Tables here.
Dimension Nodes
Dimension nodes are generally descriptive in nature and describe Facts (see Fact Nodes above). For example time and location. They can also be used to store history.
A business key at the appropriate grain of the data is required for Dimension Nodes.
Coalesce currently supports Type 1 and Type 2 slowly changing dimensions:
To create a Type 1 dimension (current state of the data) - do not choose a change tracking column.
To create a Type 2 dimension (current and historical data) - do choose a change tracking column.
You can read more about Dimension Tables here
View Nodes (disabled by default)
View nodes are used for creating a generic SQL VIEW. Many other node types can also be materialized as SQL VIEWs. The purpose of having this node type is for improved readability when looking at a DAG.
They are disabled by default but can be toggled in your Build Settings Node Types
Updated 8 months ago