Node types control the actual creation and data manipulation of all database objects within your data pipeline. Node types are defined by Jinja/SQL templates and a YAML Node definition which has numerous options for customization in the Node Type Editor. Templates are split into Create and Run. Create templates define the structure of a table such as columns and data types while Run templates define how the data is to be loaded into the table after creation. Information on the node definition configuration options can be found in the Node Type Editor section.
Source nodes represent the tables, external tables, or views where raw data is queried to be processed by subsequent nodes. They are not manually editable other than to change their location or add a description.
Stage nodes are intermediate nodes in the graph where you prepare the data by applying business logic.
By default, stage nodes truncate data before every run and always contain only the current batch of data. These nodes can be materialized into either a table or a view, but default to a table for easier troubleshooting.
Divide and conquer complex business logic by splitting it into multiple Stage Nodes.
Persistent stage nodes are similar to Stage Nodes, except that they:
- Do not truncate the data by default
- Persist data across multiple runs
- Support tracking change history of columns based on business key
Use Persistent Stage nodes when you want to keep the raw data for extended periods rather than just for the current batch load.
Fact nodes represent Coalesce's implementation of a Kimball Fact Table. A fact table consists of the measurements, metrics, or facts of a business process and is typically located at the center of a star schema or a snowflake schema surrounded by dimension tables.
A fact table can be configured to use a business key to MERGE data at the appropriate grain of the data. Alternatively, if no business key is selected, the Fact Node will use an INSERT statement to populate the data.
You can read more about Fact Tables here.
Dimension nodes are generally descriptive in nature and describe Facts (see Fact Nodes above). For example time and location. They can also be used to store history.
A business key at the appropriate grain of the data is required for Dimension Nodes.
Coalesce currently supports Type 1 and Type 2 slowly changing dimensions:
To create a Type 1 dimension (current state of the data) - do not choose a change tracking column.
To create a Type 2 dimension (current and historical data) - do choose a change tracking column.
You can read more about Dimension Tables here
View nodes are used for creating a generic SQL VIEW. Many other node types can also be materialized as SQL VIEWs. The purpose of having this node type is for improved readability when looking at a DAG.
They are disabled by default but can be toggled in your Build Settings Node Types
Updated 11 months ago