Streams And Tasks
Overview
Snowflake streams and tasks out of the box.
Installation
- In Coalesce, open the Workspace where you wish to install the package.
- Go to the Build Setting of the Workspace, tab Packages, and click the Install button on the top right of the page.
- Paste the Package ID, and proceed with the installation process.
Description
Streams and Tasks Package
The Coalesce Stream and Task Node Types Package includes:
- Work with Task
- Dimension with Task
- Fact with Task
- Task DAG Create Root
- Task DAG Resume Root
- Stream
- Stream and Insert or Merge
- Stream for Directory Table
- Delta Stream Merge
- Insert Or Merge with Task
- Code
Work with Task
The Coalesce Work with Task UDN is a work node that wraps the standard Coalesce Work node with a Task.
Tasks can be combined with Coalesce Stream node (table streams) for continuous ELT workflows to process recently changed table rows. Streams ensure exactly once semantics for new or changed data in a table.
Tasks can also be used independently to generate periodic reports by inserting or merging rows into a report table or performing other periodic work.
More information about Tasks can be found in Snowflake Introduction to tasks.
Work with Task Node Configuration
The Work with Task node has two or three configuration groups depending on config options selected:
- Node Properties
- Options
- Scheduling Options - Visible when Development Mode is false
Work with Task Properties
| Property | Description |
|---|---|
| Storage Location | Storage Location where the Stream will be created |
| Node Type | Name of template used to create node objects |
| Description | A description of the node's purpose |
| Deploy Enabled | If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment |
Work with Task Options
| Option | Description |
|---|---|
| Development Mode | True / False toggle that determines whether a task will be created or if the SQL to be used in the task will execute as DML as a Run action. True - A table will be created and SQL will execute as a Run action False - After testing the SQL as a Run action, setting to false will wrap SQL in a task with specified Scheduling Options |
| Multi Source | True / False toggle that is Coalesce implementation of SQL UNIONs True - Multiple sources can be combined using: - UNION - Combines with duplicate elimination - UNION ALL - Combines without duplicate elimination False - Single source node or multiple sources combined using a join |
| Cluster key | True/False toggle that determines if clustering is enabled True - Specify clustering column and optionally allow expressions False - No clustering |
| Truncate Before | True / False toggle that determines if table should be truncated before insert True - Uses INSERT OVERWRITE False - Uses INSERT to append data |
Work with Task General Options
| Option | Description |
|---|---|
| Distinct | True/False toggle that determines whether to add DISTINCT to SQL Query True - Group by All is invisible. DISTINCT data is chosen False - Group by All is visible |
| Group by All | True/False toggle that determines whether to add GROUP BY ALL to SQL Query True - DISTINCT is invisible. Data grouped by all columns False - DISTINCT is visible |
| Order By | True/False toggle that determines whether to add ORDER BY to SQL Query True - Sort column and order options visible False - Sort options invisible |
Work with Task Scheduling Options
| Option | Description |
|---|---|
| Scheduling Mode | Choose compute type: - Warehouse Task: User managed warehouse executes tasks - Serverless Task: Uses serverless compute |
| When Source Stream has Data Flag | True/False toggle to check for stream data True - Only run task if source stream has capture change data False - Run task on schedule regardless of whether the source stream has data. If the source is not a stream should set this to false. |
| Multiple Stream has Data Logic | AND/OR logic when multiple streams (visible if Stream has Data Flag is true) AND - If there are multiple streams task will run if all streams have data OR - If there are multiple streams task will run if one or more streams has data |
| Select Warehouse | Visible if Scheduling Mode is set to Warehouse Task. Enter the name of the warehouse you want the task to run on without quotes. |
| Select initial serverless size | Visible when Scheduling Mode is set to Serverless Task. Select the initial compute size on which to run the task. Snowflake will adjust size from there based on target schedule and task run times. |
| Task Schedule | Choose schedule type: - Minutes - Specify interval in minutes. Enter a whole number from 1 to 11520 which represents the number of minutes between task runs. - Cron - Uses Cron expressions. Specifies a cron expression and time zone for periodically running the task. Supports a subset of standard cron utility syntax. - Predecessor - Specify dependent tasks |
| Enter predecessor tasks separated by a comma | Visible when Task Schedule is set to Predecessor. One or more task names that precede the task being created in the current node. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces. |
| Root task name | Visible when Task Schedule is set to Predecessor. Name of the root task that controls scheduling for the DAG of tasks. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces. |
Example of Serverless Task with Multiple Predecessors and Root Task
Example of Warehouse Task With 60 Minute Task Schedule
Example of Warehouse Task With Cron Schedule Not Using a Stream
Work With Task Deployment
Work With Task Deployment Parameters
The Work with Task includes an environment parameter that allows you to specify a different warehouse used to run a task in different environments.
The parameter name is targetTaskWarehouse and the default value is DEV ENVIRONMENT.
When set to DEV ENVIRONMENT the value entered in the Scheduling Options config Select Warehouse on which to run the task will be used when creating the task.
{
"targetTaskWarehouse": "DEV ENVIRONMENT"
}
When set to any value other than DEV ENVIRONMENT the node will attempt to create the task using a Snowflake warehouse with the specified value.
For example, with the below setting for the parameter in a QA environment, the task will execute using a warehouse named compute_wh.
{
"targetTaskWarehouse": "compute_wh"
}
Work With Task Initial Deployment
When deployed for the first time into an environment the Work with Task node will execute the following stages:
For tasks without predecessors:
| Stage | Description |
|---|---|
| Create Target Table | Creates table that will be loaded by the task |
| Create Task | Creates task that will load the target table on schedule |
| Resume Task | Resumes the task so it runs on schedule |
For tasks with predecessors:
| Stage | Description |
|---|---|
| Create Target Table | Creates table that will be loaded by the task |
| Suspend Root Task | Suspends root task for DAG modification |
| Create Task | Creates task that will load the target table on schedule |
If a task is part of a DAG of tasks, the DAG needs to include a node type called "Task DAG Resume Root." This node will resume the root node once all the dependent tasks have been created as part of a deployment.
The task node has no ALTER capabilities. All task-enabled nodes are CREATE OR REPLACE only, though this is subject to change.
Work With Task Redeployment
After initial deployment, changes in task schedule, warehouse, or scheduling options will result in a CREATE or RESUME
For tasks without predecessors:
| Stage | Description |
|---|---|
| Create Task | Recreates task with new schedule |
| Resume Task | Resumes task with new schedule |
For tasks with predecessors:
| Stage | Description |
|---|---|
| Suspend Root Task | Suspends root task for DAG modification |
| Create Task | Recreates task with new schedule |
Work With Task Altering Tables
Subsequent deployments with changes in table like add or drop column and change in data type will result in an ALTER table statement followed by CREATE TASK AND RESUME TASK statements being issued.
| Stage | Description |
|---|---|
| Change Column Attributes/Delete Column/Add Column/Change table description | Alter table statement is executed to perform the alter operation. |
| Create Task | Recreates task with new schedule |
| Resume Task | Resumes task with new schedule |
Redeployment with only metadata changes
Sometimes, changes to config can result in metadata changes from node edits, DML changes, or storage updates when the Development Mode is ON. A few cases are listed below:
- Changes in business keys
- Changes to change tracking keys
- Changes in join clauses
- Transformations made at column level
- Changing DML options like DISTINCT, ORDER BY, GROUP BY ALL
And many more. Most of the time, specific ‘is’ and ‘was’ values will be displayed to specifically show what changed.
The following stages are executed:
| Stage | Description |
|---|---|
| Metadata Update | Business Keys | Change Tracking | Distinct | Transformation | Join | A dummy statement would execute with specific changes listed in comments |
Work With Task Undeployment
If a Work with Task node is deleted from a Workspace, that Workspace is committed to Git and that commit deployed to a higher level environment then all objects created by the node in the target environment will be dropped.
For tasks without predecessors:
| Stage | Description |
|---|---|
| Drop Table | Drop the table originally created to be loaded by the task. |
| Drop Current Task | Drop the task |
For tasks with predecessors:
| Stage | Description |
|---|---|
| Drop Table | Drop the table |
| Suspend Root Task | Drop a task from a DAG of task the root task needs to be put into a suspended state. |
| Drop Task | Drops the task |
If a task is part of a DAG of tasks the DAG needs to include a node type called Task Dag Resume Root. This node will resume the root node once all the dependent tasks have been created as part of a deployment.
Dimension With Task
The Coalesce Dimension with Task UDN is a node that wraps the standard Coalesce Dimension node with a Task.
Tasks can be combined with Coalesce Stream node (table streams) for continuous ELT workflows to process recently changed table rows. Streams ensure exactly once semantics for new or changed data in a table.
Tasks can also be used independently to generate periodic reports by inserting or merging rows into a report table or performing other periodic work.
More information about Tasks can be found in Snowflake Introduction to tasks.
Dimension With Task Node Configuration
The Dimension with Task node has two or three configuration groups depending on config options selected:
- Node Properties
- Options
- General Options
- Scheduling Options - Visible when Development Mode is false
Dimension With Task Node Properties
| Property | Description |
|---|---|
| Storage Location | Storage Location where the Stream will be created |
| Node Type | Name of template used to create node objects |
| Description | A description of the node's purpose |
| Deploy Enabled | If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment |
Dimension With Task Options
| Option | Description |
|---|---|
| Development Mode | True / False toggle that determines whether a task will be created or if the SQL to be used in the task will execute as DML as a Run action True - A table will be created and SQL will execute as a Run action False - After testing the SQL as a Run action, setting to false will wrap SQL in a task with specified Scheduling Options |
| Multi Source | True / False toggle that is Coalesce implementation of SQL UNIONs True - Multiple sources can be combined using: - UNION - Combines with duplicate elimination - UNION ALL - Combines without duplicate elimination False - Single source node or multiple sources combined using a join |
| Business key | Required column for both Type 1 and Type 2 Dimensions |
| Change tracking | Required column for Type 2 Dimension |
| Cluster key | True/False toggle that determines if clustering is enabled True - Specify clustering column and optionally allow expressions False - No clustering |
Dimension With Task General Options
| Option | Description |
|---|---|
| Distinct | True/False toggle that determines whether to add DISTINCT to SQL Query True - Group by All is invisible. DISTINCT data is chosen False - Group by All is visible |
| Group by All | True/False toggle that determines whether to add GROUP BY ALL to SQL Query True - DISTINCT is invisible. Data grouped by all columns False - DISTINCT is visible |
| Order By | True/False toggle that determines whether to add ORDER BY to SQL Query True - Sort column and order options visible False - Sort options invisible |
Dimension With Task Scheduling Options
If Development Mode is set to false, use Scheduling Options to configure how and when the task will run.
| Option | Description |
|---|---|
| Scheduling Mode | Choose compute type: - Warehouse Task: User managed warehouse executes tasks - Serverless Task: Uses serverless compute |
| When Source Stream has Data Flag | True/False toggle to check for stream data True - Only run task if source stream has capture change data False - Run task on schedule regardless of whether the source stream has data. If the source is not a stream should set this to false. |
| Multiple Stream has Data Logic | AND/OR logic when multiple streams (visible if Stream has Data Flag is true) AND - If there are multiple streams task will run if all streams have data OR - If there are multiple streams task will run if one or more streams has data |
| Select Warehouse | Visible if Scheduling Mode is set to Warehouse Task. Enter the name of the warehouse you want the task to run on without quotes. |
| Select initial serverless size | Visible when Scheduling Mode is set to Serverless Task. Select the initial compute size on which to run the task. Snowflake will adjust size from there based on target schedule and task run times. |
| Task Schedule | Choose schedule type: - Minutes - Specify interval in minutes. Enter a whole number from 1 to 11520 which represents the number of minutes between task runs. - Cron - Uses Cron expressions. Specifies a cron expression and time zone for periodically running the task. Supports a subset of standard cron utility syntax. - Predecessor - Specify dependent tasks |
| Enter predecessor tasks separated by a comma | Visible when Task Schedule is set to Predecessor. One or more task names that precede the task being created in the current node. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces. |
| Root task name | Visible when Task Schedule is set to Predecessor. Name of the root task that controls scheduling for the DAG of tasks. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces. |
Dimension With Task Deployment
Dimenson With Task Deployment Parameters
The Dimenson with Task includes an environment parameter that allows you to specify a different warehouse used to run a task in different environments.
The parameter name is targetTaskWarehouse and the default value is DEV ENVIRONMENT.
When set to DEV ENVIRONMENT the value entered in the Scheduling Options config Select Warehouse on which to run the task will be used when creating the task.
{
"targetTaskWarehouse": "DEV ENVIRONMENT"
}
When set to any value other than DEV ENVIRONMENT the node will attempt to create the task using a Snowflake warehouse with the specified value.
For example, with the below setting for the parameter in a QA environment, the task will execute using a warehouse named compute_wh.
{
"targetTaskWarehouse": "compute_wh"
}
Dimension With Task Initial Deployment
When deployed for the first time into an environment the Dimension with Task node will execute three stages dependent on whether or not the task schedule relies on a predecessor task.
For tasks without predecessor:
| Stage | Description |
|---|---|
| Create Target Table | Creates table that will be loaded by the task |
| Create Task | Creates task that will load the target table on schedule |
| Resume Task | Resumes the task so it runs on schedule |
For tasks with predecessor:
| Stage | Description |
|---|---|
| Create Target Table | Creates table that will be loaded by the task |
| Suspend Root Task | Suspends root task for DAG modification |
| Create Task | Creates task that will load the target table on schedule |
If a task is part of a DAG of tasks, the DAG needs to include a node type called Task DAG Resume Root. This node will resume the root node once all the dependent tasks have been created as part of a deployment.
The task node has no ALTER capabilities. All task-enabled nodes are CREATE OR REPLACE only, though this is subject to change.
Dimension With Task Redeployment
After initial deployment, changes in task schedule, warehouse, or scheduling options will result in a CREATE or RESUME TASK.
For tasks without predecessor:
| Stage | Description |
|---|---|
| Create Task | Recreates task with a new schedule |
| Resume Task | Resumes task with a new schedule |
For tasks with predecessor:
| Stage | Description |
|---|---|
| Suspend Root Task | Suspends root task for DAG modification |
| Create Task | Recreates task with new schedule |
Dimension With Task Altering Tables
Subsequent deployments with changes in table such as add or drop column and change in data type will result in an ALTER table statement followed by CREATE TASK AND RESUME TASK statements being issued.
| Stage | Description |
|---|---|
| Change Column Attributes/Delete Column/Add Column/Change table description | Alter table statement is executed to perform the alter operation. |
| Create Task | Recreates task with new schedule |
| Resume Task | Resumes task with new schedule |
Redeployment with only metadata changes
Sometimes, changes to config can result in metadata changes from node edits, DML changes, or storage updates when the Development Mode is ON. A few cases are listed below:
- Changes in business keys
- Changes to change tracking keys
- Changes in join clauses
- Transformations made at column level
- Changing DML options like DISTINCT, ORDER BY, GROUP BY ALL
And many more. Most of the time, specific ‘is’ and ‘was’ values will be displayed to specifically show what changed.
The following stages are executed:
| Stage | Description |
|---|---|
| Metadata Update | Business Keys | Change Tracking | Distinct | Transformation | Join | A dummy statement would execute with specific changes listed in comments |
Dimension With Task Undeployment
If a Work with Task node is deleted from a Workspace, that Workspace is committed to Git and that commit deployed to a higher level environment then all objects created by the node in the target environment will be dropped.
For tasks without predecessor:
| Stage | Description |
|---|---|
| Drop Table | Drop the table originally created to be loaded by the task. |
| Drop Current Task | Drops the task |
For tasks with predecessor:
| Stage | Description |
|---|---|
| Drop Table | Drop the table originally created to be loaded by the task. |
| Suspend Root Task | Drop a task from a DAG of task the root task needs to be put into a suspended state. |
| Drop Task | Drop the task |
If a task is part of a DAG of tasks the DAG needs to include a node type called Task Dag Resume Root. This node will resume the root node once all the dependent tasks have been created as part of a deployment.
Fact With Task
The Coalesce Fact with Task UDN is a node that wraps the standard Coalesce Fact node with a Task.
Tasks can be combined with Coalesce Stream node (table streams) for continuous ELT workflows to process recently changed table rows. Streams ensure exactly once semantics for new or changed data in a table.
Tasks can also be used independently to generate periodic reports by inserting or merging rows into a report table or performing other periodic work.
More information about Tasks can be found in Snowflake Introduction to tasks.
Fact With Task Node Configuration
The Fact with Task node has two or three configuration groups depending on config options selected:
- Node Properties
- Options
- Scheduling Options - Visible when Development Mode is false
Fact With Task Node Properties
| Property | Description |
|---|---|
| Storage Location | Storage Location where the Stream will be created |
| Node Type | Name of template used to create node objects |
| Description | A description of the node's purpose |
| Deploy Enabled | If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment |
Fact With Task Options
| Option | Description |
|---|---|
| Development Mode | True / False toggle that determines whether a task will be created or if the SQL to be used in the task will execute as DML as a Run action True - A table will be created and SQL will execute as a Run action False - After testing the SQL as a Run action, setting to false will wrap SQL in a task with specified Scheduling Options |
| Multi Source | True / False toggle that is Coalesce implementation of SQL UNIONs True - Multiple sources can be combined using: - UNION - Combines with duplicate elimination - UNION ALL - Combines without duplicate elimination False - Single source node or multiple sources combined using a join |
| Cluster key | True/False toggle that determines if clustering is enabled True - Specify clustering column and optionally allow expressions False - No clustering |
| Truncate Before | Specifies that the target table should be truncated before inserting the values into the table. |
Fact With Task General Options
| Option | Description |
|---|---|
| Distinct | True/False toggle that determines whether to add DISTINCT to SQL Query True - Group by All is invisible. DISTINCT data is chosen False - Group by All is visible |
| Group by All | True/False toggle that determines whether to add GROUP BY ALL to SQL Query True - DISTINCT is invisible. Data grouped by all columns False - DISTINCT is visible |
| Order By | True/False toggle that determines whether to add ORDER BY to SQL Query True - Sort column and order options visible False - Sort options invisible |
Fact With Task Scheduling Options
If Development Mode is set to false, use Scheduling Options to configure how and when the task will run.
| Option | Description |
|---|---|
| Scheduling Mode | Choose compute type: - Warehouse Task: User managed warehouse executes tasks - Serverless Task: Uses serverless compute |
| Multiple Stream has Data Flag | True / False toggle that checks whether source streams have data before executing a task. True: Only run task if source stream has capture change data False: Run task on schedule regardless of whether the source stream has data. If the source is not a stream should set this to false. |
| Multiple Stream has Data Logic | AND/OR logic when multiple streams (visible if Stream has Data Flag is true) AND - If there are multiple streams task will run if all streams have data OR - If there are multiple streams task will run if one or more streams has data |
| Select Warehouse | Visible if Scheduling Mode is set to Warehouse Task. Enter the name of the warehouse you want the task to run on without quotes. |
| Select initial serverless size | Visible when Scheduling Mode is set to Serverless Task. Select the initial compute size on which to run the task. Snowflake will adjust size from there based on target schedule and task run times. |
| Task Schedule | Choose schedule type: - Minutes - Specify interval in minutes. Enter a whole number from 1 to 11520 which represents the number of minutes between task runs. - Cron - Uses Cron expressions. Specifies a cron expression and time zone for periodically running the task. Supports a subset of standard cron utility syntax. - Predecessor - Specify dependent tasks |
| Enter predecessor tasks separated by a comma | Visible when Task Schedule is set to Predecessor. One or more task names that precede the task being created in the current node. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces. |
| Root task name | Visible when Task Schedule is set to Predecessor. Name of the root task that controls scheduling for the DAG of tasks. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces. |
Fact With Task Deployment
Fact With Task Deployment Parameters
The Fact with Task includes an environment parameter that allows you to specify a different warehouse used to run a task in different environments.
The parameter name is targetTaskWarehouse and the default value is DEV ENVIRONMENT.
When set to DEV ENVIRONMENT the value entered in the Scheduling Options config Select Warehouse on which to run task will be used when creating the task.
{
"targetTaskWarehouse": "DEV ENVIRONMENT"
}
When set to any value other than DEV ENVIRONMENT the node will attempt to create the task using a Snowflake warehouse with the specified value.
For example, with the below setting for the parameter in a QA environment, the task will execute using a warehouse named compute_wh.
{
"targetTaskWarehouse": "compute_wh"
}
Fact With Task Initial Deployment
When deployed for the first time into an environment the Fact with Task node will execute three stages dependent on whether or not the task schedule relies on a predecessor task.
For tasks without predecessors:
| Stage | Description |
|---|---|
| Create Target Table | Creates table that will be loaded by the task |
| Create Task | Creates task that will load the target table on schedule |
| Resume Task | Resumes the task so it runs on schedule |
For tasks with predecessors:
| Stage | Description |
|---|---|
| Create Target Table | Creates table that will be loaded by the task |
| Suspend Root Task | Suspends root task for DAG modification |
| Create Task | Creates task that will load the target table on schedule |
If a task is part of a DAG of tasks, the DAG needs to include a node type called Task DAG Resume Root. This node will resume the root node once all the dependent tasks have been created as part of a deployment.
The task node has no ALTER capabilities. All task-enabled nodes are CREATE OR REPLACE only, though this is subject to change.
Fact With Task Redeployment
After initial deployment, changes to task schedule, warehouse, or scheduling options will result in a CREATE and RESUME TASK.
For tasks without predecessors:
| Stage | Description |
|---|---|
| Create Task | Recreates task with new schedule |
| Resume Task | Resumes task with new schedule |
For tasks with predecessors:
| Stage | Description |
|---|---|
| Suspend Root Task | Suspends root task for DAG modification |
| Create Task | Creates task that will load the target table on schedule |
Fact With Task Altering the Tables
Changes to add or drop column, or change in data type will result in a ALTER, CREATE, AND RESUME TASK.
| Stage | Description |
|---|---|
| Alter Table | Modifies table structure |
| Create Task | Recreates task |
| Resume Task | Resumes updated task |
Redeployment with only metadata changes
Sometimes, changes to config can result in metadata changes from node edits, DML changes, or storage updates when the Development Mode is ON. A few cases are listed below:
- Changes in business keys
- Changes to change tracking keys
- Changes in join clauses
- Transformations made at column level
- Changing DML options like DISTINCT, ORDER BY, GROUP BY ALL
And many more. Most of the time, specific ‘is’ and ‘was’ values will be displayed to specifically show what changed.
The following stages are executed:
| Stage | Description |
|---|---|
| Metadata Update | Business Keys | Change Tracking | Distinct | Transformation | Join | A dummy statement would execute with specific changes listed in comments |
Fact With Task Undeployment
If a Fact with Task node is deleted from a Workspace, and that Workspace is committed to Git and that commit is deployed to a higher-level environment, then all objects created by the node in the target environment will be dropped.
For tasks without predecessors:
| Stage | Description |
|---|---|
| Drop Table | Drop the table created to be loaded by the task |
| Drop Current Task | Removes the task |
For tasks with predecessors:
| Stage | Description |
|---|---|
| Drop Table | Drop the table created to be loaded by the task |
| Suspend Root Task | Suspends root task |
| Drop Task | Removes the task |
If a task is part of a DAG of tasks, the DAG needs to include a node type called Task Dag Resume Root. This node will resume the root node once all the dependent tasks have been created as part of a deployment.
Task DAG Create Root
The Coalesce Task DAG Create Root UDN is a node that helps to create a standalone root task.
The root task should have a defined schedule that initiates a run of the DAG. Each of the other tasks has at least one defined predecessor to link the tasks in the DAG. A child task runs only after all of its predecessor tasks run successfully to completion.
Tasks can also be used independently to generate periodic reports by inserting or merging rows into a report table or perform other periodic work.
More information about Tasks can be found in Snowflake's Introduction to tasks.
Task DAG Create Root Node Configuration
The Task DAG Create Root node has two configuration groups:
Task DAG Create Root Node Properties
| Property | Description |
|---|---|
| Storage Location | Storage Location where the Stream will be created |
| Node Type | Name of template used to create node objects |
| Description | A description of the node's purpose |
| Deploy Enabled | If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment |
Task DAG Create Root Scheduling Options
| Option | Description |
|---|---|
| Scheduling Mode | Choose compute type: - Warehouse Task - User managed warehouse executes tasks - Serverless Task - Uses serverless compute |
| Select Warehouse | Visible if Scheduling Mode is set to Warehouse Task. Name of warehouse to run task on without quotes |
| Select initial serverless size | Visible when Scheduling Mode is set to Serverless Task Initial compute size for serverless tasks. Snowflake will adjust size from there based on target schedule and task run times. |
| Task Schedule | Choose schedule type: - Minutes - Specify interval in minutes - Cron - Use cron expression -Triggered task - To create task when the source streams have data |
| Multiple source streams(if disabled,considered as single source stream) | (visible obly for Triggered Task)Toggle- Enabled denotes multiple streams are connected |
| Multiple Stream has Data Logic | AND/OR logic when multiple streams (visible obly for Triggered Task and multiple streams is enabled) AND - If there are multiple streams task will run if all streams have data OR - If there are multiple streams task will run if one or more streams has data |
| Enter root task SQL | The SQL statement to be run when a standalone root task executes |
Task DAG Create Root Deployment
Task DAG Create Root Deployment Parameters
The parameter name is targetTaskWarehouse and the default value is DEV ENVIRONMENT.
When set to DEV ENVIRONMENT the value entered in the Scheduling Options config Select Warehouse on which to run task will be used when creating the task.
{
"targetTaskWarehouse": "DEV ENVIRONMENT"
}
When set to any value other than DEV ENVIRONMENT the node will attempt to create the task using a Snowflake warehouse with the specified value.
For example, with the below setting for the parameter in a QA environment, the task will execute using a warehouse named compute_wh.
{
"targetTaskWarehouse": "compute_wh"
}
Task DAG Create Root Initial Deployment
When deployed for the first time into an environment, the following stages execute:
| Stage | Description |
|---|---|
| Suspend Root Task | Suspends root task for creation |
| Create Root Task | Creates task that will execute Root SQL on schedule |
If a task is part of a DAG of tasks, the DAG needs to include a node type called Task Dag Resume Root. This node will resume the root node once all the dependent tasks have been created as part of a deployment.
Task DAG Create Root Redeployment
After the Task has deployed for the first time into a target environment, subsequent deployments will execute two stages:
| Stage | Description |
|---|---|
| Suspend Root Task | Suspends root task |
| Create Root Task | Recreates root task |
Task DAG Create Root Undeployment
When a Task DAG Create Root node is deleted, two stages are executed:
| Stage | Description |
|---|---|
| Suspend Root task | Suspends root task |
| Drop current task | Removes the task |
Task DAG Resume Root
The Coalesce Task DAG Resume Root UDN is a node type that helps to resume the root task and its dependents or child tasks. Recursively resumes all dependent tasks tied to a specified root task in a DAG using the root task name specified.
The root task should have a defined schedule that initiates a run of the DAG. Each of the other tasks has at least one defined predecessor to link the tasks in the DAG. A child task runs only after all of its predecessor tasks run successfully to completion.
Tasks can also be used independently to generate periodic reports by inserting or merging rows into a report table or perform other periodic work.
More information about Tasks can be found in Snowflake's Introduction to tasks.
Task DAG Resume Root Node Configuration
The Task DAG Resume Root node has two configuration groups:
Task DAG Resume Root Node Properties
| Property | Description |
|---|---|
| Storage Location | Storage Location where the Stream will be created |
| Node Type | Name of template used to create node objects |
| Description | A description of the node's purpose |
| Deploy Enabled | If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment |
Task DAG Resume Root Scheduling Options
| Option | Description |
|---|---|
| Enter root task name | Name of the root task to be resumed - recursively resumes all dependent tasks tied to this specified root task |
Task DAG Resume Root Deployment
Task DAG Resume Root Initial Deployment
When deployed for the first time into an environment the following stage executes:
| Stage | Description |
|---|---|
| Try Enable Root Task | Resumes root task and all its dependents |
If a task is part of a DAG of tasks, the DAG needs to include a node type called Task Dag Resume Root. This node will resume the root node once all the dependent tasks have been created as part of a deployment.
Task DAG Resume Root Undeployment
If a Dimension with Task node is deleted from a Workspace, that Workspace is committed to Git and that commit deployed to a higher-level environment then two stages are executed.
| Stage | Description |
|---|---|
| Drop Table | Removes the table |
| Drop Current Task | Removes the task |
Stream
The Coalesce Stream UDN is a node that allows you to develop and deploy a stream on top of a table, view or external table.
A stream logically takes an initial snapshot of every row in the source object (e.g. table, external table, or the underlying tables for a view) by initializing a point in time (called an offset) as the current transactional version of the object. The change tracking system utilized by the stream then records information about the DML changes after this snapshot was taken. Change records provide the state of a row before and after the change. Change information mirrors the column structure of the tracked source object and includes additional metadata columns that describe each change event.
More information about Streams can be found in the official Snowflake Introduction to Streams.
Stream Node Configuration
The Stream has two configuration groups:
Stream Node Properties
| Property | Description |
|---|---|
| Storage Location | Storage Location where the Stream will be created |
| Node Type | Name of template used to create node objects |
| Description | A description of the node's purpose |
| Deploy Enabled | If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment |
Stream Options
| Option | Description |
|---|---|
| Source Object | Type of object for stream creation: Table: Append Only Stream: True: Append-only stream False: Standard stream Show Initial Rows: Specify the records to return the first time the stream is consumed. True: The stream returns only the rows that existed in the source object at the moment when the stream was created. Subsequently, the stream returns any DML changes to the source object since the most recent offset - the normal stream behavior. False: The stream returns any DML changes to the source object since the most recent offset. Redeployment Behavior: Options for redeployment Dynamic Table: Show Initial Rows: Specify the records to return the first time the stream is consumed. True: The stream returns only the rows that existed in the source object at the moment when the stream was created. Subsequently, the stream returns any DML changes to the source object since the most recent offset - the normal stream behavior. False: The stream returns any DML changes to the source object since the most recent offset. Redeployment Behavior: Options for redeployment View: Append Only Stream: True: Append-only stream False: Standard stream Show Initial Rows: Specify the records to return the first time the stream is consumed. True: The stream returns only the rows that existed in the source object at the moment when the stream was created. Subsequently, the stream returns any DML changes to the source object since the most recent offset - the normal stream behavior. False: The stream returns any DML changes to the source object since the most recent offset. Redeployment Behavior: Options for redeployment External table: Redeployment Behavior: Options for redeployment External iceberg table: Redeployment Behavior: Options for redeployment |
Stream System Columns
A Stream UDN adds three system columns to the output of the node. These columns can be used together to track INSERT, UPDATE and DELETE operations against a source object.
| Column | Description |
|---|---|
| METADATA$ACTION | Indicates the DML operation (INSERT, DELETE) recorded |
| METADATA$ISUPDATE | Indicates whether the operation was part of an UPDATE statement. Updates to rows in the source object are represented as a pair of DELETE and INSERT records in the stream with a metadata column METADATA$ISUPDATE values set to TRUE. |
| METADATA$ROW_ID | Specifies the unique and immutable ID for the row, used to track changes over time |
Stream Deployment
Stream Deployment Parameters
No deployment parameters are required.
Stream Initial Deployment
When deployed for the first time into an environment the Stream node executes:
| Stage | Description |
|---|---|
| Create Stream | Executes a CREATE OR REPLACE statement to create a Stream in the target environment |
Stream Redeployment
After initial deployment, subsequent deployments will create a new stream based on the selected redeployment behavior:
| Redeployment Behavior | Stage Executed |
|---|---|
| Create Stream if not exists | Re-Create Stream at existing offset |
| Create or Replace | Create Stream |
| Create at existing stream | Re-Create Stream at existing offset |
Stream Undeployment
When a Stream Node is deleted from a Workspace and that commit is deployed, the following stage executes:
| Stage | Description |
|---|---|
| Drop Stream | Removes the stream from the target environment |
Stream and Insert or Merge
The Coalesce Streams and Insert or Merge UDN is a node that allows you to develop and deploy a stream on top of a table, view or external table. Also, provides option to create a target table to insert or merge data from source with a task on top of it.
A stream logically takes an initial snapshot of every row in the source object (e.g. table, external table, or the underlying tables for a view) by initializing a point in time (called an offset) as the current transactional version of the object. The change tracking system utilized by the stream then records information about the DML changes after this snapshot was taken. Change records provide the state of a row before and after the change. Change information mirrors the column structure of the tracked source object and includes additional metadata columns that describe each change event.
More information about Streams can be found in the official Snowflake's Introduction to Streams.