Streams And Tasks

‹ View all packages

Latest Version:: 1.1.5

Release Date:: November 08, 2024

Package ID:: @coalesce/streams-and-tasks

Coalesce Certified Package

Overview

Snowflake streams and tasks out of the box.

Installation

Copy the Package ID @coalesce/streams-and-tasks
In Coalesce, open the Workspace where you wish to install the package.
Go to the Build Setting of the Workspace, tab Packages, and click the Install button on the top right of the page.
Paste the Package ID, and proceed with the installation process.

Description

Streams and Tasks Package

The Coalesce Stream and Task Node Types Package includes:

Work with Task
Dimension with Task
Fact with Task
Task DAG Create Root
Task DAG Resume Root
Stream
Stream and Insert or Merge

Work with Task

The Coalesce Work with Task UDN is a work node that wraps the standard Coalesce Work node with a Task.

Tasks can be combined with Coalesce Stream node (table streams) for continuous ELT workflows to process recently changed table rows. Streams ensure exactly once semantics for new or changed data in a table.

Tasks can also be used independently to generate periodic reports by inserting or merging rows into a report table or performing other periodic work.

More information about Tasks can be found in Snowflake Introduction to tasks.

Work with Task Node Configuration

The Work with Task node has two or three configuration groups depending on config options selected:

Node Properties
Options
Scheduling Options - Visible when Development Mode is false

Work with Task Properties

Property	Description
Storage Location	Storage Location where the Stream will be created
Node Type	Name of template used to create node objects
Description	A description of the node's purpose
Deploy Enabled	If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment

Work with Task Options

Worktask-opt

Option	Description
Development Mode	True / False toggle that determines whether a task will be created or if the SQL to be used in the task will execute as DML as a Run action. True - A table will be created and SQL will execute as a Run action False - After testing the SQL as a Run action, setting to false will wrap SQL in a task with specified Scheduling Options
Multi Source	True / False toggle that is Coalesce implementation of SQL UNIONs True - Multiple sources can be combined using: - UNION - Combines with duplicate elimination - UNION ALL - Combines without duplicate elimination False - Single source node or multiple sources combined using a join
Cluster key	True/False toggle that determines if clustering is enabled True - Specify clustering column and optionally allow expressions False - No clustering
Truncate Before	True / False toggle that determines if table should be truncated before insert True - Uses INSERT OVERWRITE False - Uses INSERT to append data

Work with Task General Options

Work with Task-GO

Option	Description
Distinct	True/False toggle that determines whether to add DISTINCT to SQL Query True - Group by All is invisible. DISTINCT data is chosen False - Group by All is visible
Group by All	True/False toggle that determines whether to add GROUP BY ALL to SQL Query True - DISTINCT is invisible. Data grouped by all columns False - DISTINCT is visible
Order By	True/False toggle that determines whether to add ORDER BY to SQL Query True - Sort column and order options visible False - Sort options invisible

Work with Task Scheduling Options

Task Scheduling Options

Option	Description
Scheduling Mode	Choose compute type: - Warehouse Task: User managed warehouse executes tasks - Serverless Task: Uses serverless compute
When Source Stream has Data Flag	True/False toggle to check for stream data True - Only run task if source stream has capture change data False - Run task on schedule regardless of whether the source stream has data. If the source is not a stream should set this to false.
Multiple Stream has Data Logic	AND/OR logic when multiple streams (visible if Stream has Data Flag is true) AND - If there are multiple streams task will run if all streams have data OR - If there are multiple streams task will run if one or more streams has data
Select Warehouse	Visible if Scheduling Mode is set to Warehouse Task. Enter the name of the warehouse you want the task to run on without quotes.
Select initial serverless size	Visible when Scheduling Mode is set to Serverless Task. Select the initial compute size on which to run the task. Snowflake will adjust size from there based on target schedule and task run times.
Task Schedule	Choose schedule type: - Minutes - Specify interval in minutes. Enter a whole number from 1 to 11520 which represents the number of minutes between task runs. - Cron - Uses Cron expressions. Specifies a cron expression and time zone for periodically running the task. Supports a subset of standard cron utility syntax. - Predecessor - Specify dependent tasks
Enter predecessor tasks separated by a comma	Visible when Task Schedule is set to Predecessor. One or more task names that precede the task being created in the current node. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces.
Root task name	Visible when Task Schedule is set to Predecessor. Name of the root task that controls scheduling for the DAG of tasks. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces.

Example of Serverless Task with Multiple Predecessors and Root Task

Example_of_Serverless_Task

Example of Warehouse Task With 60 Minute Task Schedule

Example_of_Warehouse_Task

Example of Warehouse Task With Cron Schedule Not Using a Stream

Example_of_Warehouse_Task_with_Cron

Work With Task Deployment

Work With Task Deployment Parameters

The Work with Task includes an environment parameter that allows you to specify a different warehouse used to run a task in different environments.

The parameter name is targetTaskWarehouse and the default value is DEV ENVIRONMENT.

When set to DEV ENVIRONMENT the value entered in the Scheduling Options config Select Warehouse on which to run the task will be used when creating the task.

{
    "targetTaskWarehouse": "DEV ENVIRONMENT"
}

When set to any value other than DEV ENVIRONMENT the node will attempt to create the task using a Snowflake warehouse with the specified value.

For example, with the below setting for the parameter in a QA environment, the task will execute using a warehouse named compute_wh.

{
    "targetTaskWarehouse": "compute_wh"
}

Work With Task Initial Deployment

When deployed for the first time into an environment the Work with Task node will execute the following stages:

For tasks without predecessors:

Stage	Description
Create Target Table	Creates table that will be loaded by the task
Create Task	Creates task that will load the target table on schedule
Resume Task	Resumes the task so it runs on schedule

For tasks with predecessors:

Stage	Description
Create Target Table	Creates table that will be loaded by the task
Suspend Root Task	Suspends root task for DAG modification
Create Task	Creates task that will load the target table on schedule

If a task is part of a DAG of tasks, the DAG needs to include a node type called "Task DAG Resume Root." This node will resume the root node once all the dependent tasks have been created as part of a deployment.

The task node has no ALTER capabilities. All task-enabled nodes are CREATE OR REPLACE only, though this is subject to change.

Work With Task Redeployment

After initial deployment, changes in task schedule, warehouse, or scheduling options will result in a CREATE or RESUME

For tasks without predecessors:

Stage	Description
Create Task	Recreates task with new schedule
Resume Task	Resumes task with new schedule

For tasks with predecessors:

Stage	Description
Suspend Root Task	Suspends root task for DAG modification
Create Task	Recreates task with new schedule

Work With Task Altering Tables

Subsequent deployments with changes in table like add or drop column and change in data type will result in an ALTER table statement followed by CREATE TASK AND RESUME TASK statements being issued.

Stage	Description
Change Column Attributes/Delete Column/Add Column/Change table description	Alter table statement is executed to perform the alter operation.
Create Task	Recreates task with new schedule
Resume Task	Resumes task with new schedule

Work With Task Undeployment

If a Work with Task node is deleted from a Workspace, that Workspace is committed to Git and that commit deployed to a higher level environment then all objects created by the node in the target environment will be dropped.

For tasks without predecessors:

Stage	Description
Drop Table	Drop the table originally created to be loaded by the task.
Drop Current Task	Drop the task

For tasks with predecessors:

Stage	Description
Drop Table	Drop the table
Suspend Root Task	Drop a task from a DAG of task the root task needs to be put into a suspended state.
Drop Task	Drops the task

If a task is part of a DAG of tasks the DAG needs to include a node type called Task Dag Resume Root. This node will resume the root node once all the dependent tasks have been created as part of a deployment.

Dimension With Task

The Coalesce Dimension with Task UDN is a node that wraps the standard Coalesce Dimension node with a Task.

Tasks can also be used independently to generate periodic reports by inserting or merging rows into a report table or performing other periodic work.

More information about Tasks can be found in Snowflake Introduction to tasks.

Dimension With Task Node Configuration

The Dimension with Task node has two or three configuration groups depending on config options selected:

Node Properties
Options
General Options
Scheduling Options - Visible when Development Mode is false

Dimension With Task Node Properties

Property	Description
Storage Location	Storage Location where the Stream will be created
Node Type	Name of template used to create node objects
Description	A description of the node's purpose
Deploy Enabled	If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment

Dimension With Task Options

Dimension_task_opt

Option	Description
Development Mode	True / False toggle that determines whether a task will be created or if the SQL to be used in the task will execute as DML as a Run action True - A table will be created and SQL will execute as a Run action False - After testing the SQL as a Run action, setting to false will wrap SQL in a task with specified Scheduling Options
Multi Source	True / False toggle that is Coalesce implementation of SQL UNIONs True - Multiple sources can be combined using: - UNION - Combines with duplicate elimination - UNION ALL - Combines without duplicate elimination False - Single source node or multiple sources combined using a join
Business key	Required column for both Type 1 and Type 2 Dimensions
Change tracking	Required column for Type 2 Dimension
Cluster key	True/False toggle that determines if clustering is enabled True - Specify clustering column and optionally allow expressions False - No clustering

Dimension With Task General Options

Work with Task-GO

Option	Description
Distinct	True/False toggle that determines whether to add DISTINCT to SQL Query True - Group by All is invisible. DISTINCT data is chosen False - Group by All is visible
Group by All	True/False toggle that determines whether to add GROUP BY ALL to SQL Query True - DISTINCT is invisible. Data grouped by all columns False - DISTINCT is visible
Order By	True/False toggle that determines whether to add ORDER BY to SQL Query True - Sort column and order options visible False - Sort options invisible

Dimension With Task Scheduling Options

If Development Mode is set to false, use Scheduling Options to configure how and when the task will run.

Task Scheduling Options

Option	Description
Scheduling Mode	Choose compute type: - Warehouse Task: User managed warehouse executes tasks - Serverless Task: Uses serverless compute
When Source Stream has Data Flag	True/False toggle to check for stream data True - Only run task if source stream has capture change data False - Run task on schedule regardless of whether the source stream has data. If the source is not a stream should set this to false.
Multiple Stream has Data Logic	AND/OR logic when multiple streams (visible if Stream has Data Flag is true) AND - If there are multiple streams task will run if all streams have data OR - If there are multiple streams task will run if one or more streams has data
Select Warehouse	Visible if Scheduling Mode is set to Warehouse Task. Enter the name of the warehouse you want the task to run on without quotes.
Select initial serverless size	Visible when Scheduling Mode is set to Serverless Task. Select the initial compute size on which to run the task. Snowflake will adjust size from there based on target schedule and task run times.
Task Schedule	Choose schedule type: - Minutes - Specify interval in minutes. Enter a whole number from 1 to 11520 which represents the number of minutes between task runs. - Cron - Uses Cron expressions. Specifies a cron expression and time zone for periodically running the task. Supports a subset of standard cron utility syntax. - Predecessor - Specify dependent tasks
Enter predecessor tasks separated by a comma	Visible when Task Schedule is set to Predecessor. One or more task names that precede the task being created in the current node. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces.
Root task name	Visible when Task Schedule is set to Predecessor. Name of the root task that controls scheduling for the DAG of tasks. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces.

Dimension With Task Deployment

Dimenson With Task Deployment Parameters

The Dimenson with Task includes an environment parameter that allows you to specify a different warehouse used to run a task in different environments.

The parameter name is targetTaskWarehouse and the default value is DEV ENVIRONMENT.

When set to DEV ENVIRONMENT the value entered in the Scheduling Options config Select Warehouse on which to run the task will be used when creating the task.

{
    "targetTaskWarehouse": "DEV ENVIRONMENT"
}

When set to any value other than DEV ENVIRONMENT the node will attempt to create the task using a Snowflake warehouse with the specified value.

For example, with the below setting for the parameter in a QA environment, the task will execute using a warehouse named compute_wh.

{
    "targetTaskWarehouse": "compute_wh"
}

Dimension With Task Initial Deployment

When deployed for the first time into an environment the Dimension with Task node will execute three stages dependent on whether or not the task schedule relies on a predecessor task.

For tasks without predecessor:

Stage	Description
Create Target Table	Creates table that will be loaded by the task
Create Task	Creates task that will load the target table on schedule
Resume Task	Resumes the task so it runs on schedule

For tasks with predecessor:

Stage	Description
Create Target Table	Creates table that will be loaded by the task
Suspend Root Task	Suspends root task for DAG modification
Create Task	Creates task that will load the target table on schedule

If a task is part of a DAG of tasks, the DAG needs to include a node type called Task DAG Resume Root. This node will resume the root node once all the dependent tasks have been created as part of a deployment.

The task node has no ALTER capabilities. All task-enabled nodes are CREATE OR REPLACE only, though this is subject to change.

Dimension With Task Redeployment

After initial deployment, changes in task schedule, warehouse, or scheduling options will result in a CREATE or RESUME TASK.

For tasks without predecessor:

Stage	Description
Create Task	Recreates task with a new schedule
Resume Task	Resumes task with a new schedule

For tasks with predecessor:

Stage	Description
Suspend Root Task	Suspends root task for DAG modification
Create Task	Recreates task with new schedule

Dimension With Task Altering Tables

Subsequent deployments with changes in table such as add or drop column and change in data type will result in an ALTER table statement followed by CREATE TASK AND RESUME TASK statements being issued.

Stage	Description
Change Column Attributes/Delete Column/Add Column/Change table description	Alter table statement is executed to perform the alter operation.
Create Task	Recreates task with new schedule
Resume Task	Resumes task with new schedule

Dimension With Task Undeployment

For tasks without predecessor:

Stage	Description
Drop Table	Drop the table originally created to be loaded by the task.
Drop Current Task	Drops the task

For tasks with predecessor:

Stage	Description
Drop Table	Drop the table originally created to be loaded by the task.
Suspend Root Task	Drop a task from a DAG of task the root task needs to be put into a suspended state.
Drop Task	Drop the task

Fact With Task

The Coalesce Fact with Task UDN is a node that wraps the standard Coalesce Fact node with a Task.

Tasks can also be used independently to generate periodic reports by inserting or merging rows into a report table or performing other periodic work.

More information about Tasks can be found in Snowflake Introduction to tasks.

Fact With Task Node Configuration

The Fact with Task node has two or three configuration groups depending on config options selected:

Node Properties
Options
Scheduling Options - Visible when Development Mode is false

Fact With Task Node Properties

Property	Description
Storage Location	Storage Location where the Stream will be created
Node Type	Name of template used to create node objects
Description	A description of the node's purpose
Deploy Enabled	If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment

Fact With Task Options

Dimension_task_opt

Option	Description
Development Mode	True / False toggle that determines whether a task will be created or if the SQL to be used in the task will execute as DML as a Run action True - A table will be created and SQL will execute as a Run action False - After testing the SQL as a Run action, setting to false will wrap SQL in a task with specified Scheduling Options
Multi Source	True / False toggle that is Coalesce implementation of SQL UNIONs True - Multiple sources can be combined using: - UNION - Combines with duplicate elimination - UNION ALL - Combines without duplicate elimination False - Single source node or multiple sources combined using a join
Cluster key	True/False toggle that determines if clustering is enabled True - Specify clustering column and optionally allow expressions False - No clustering
Truncate Before	Specifies that the target table should be truncated before inserting the values into the table.

Fact With Task General Options

Work with Task-GO

Option	Description
Distinct	True/False toggle that determines whether to add DISTINCT to SQL Query True - Group by All is invisible. DISTINCT data is chosen False - Group by All is visible
Group by All	True/False toggle that determines whether to add GROUP BY ALL to SQL Query True - DISTINCT is invisible. Data grouped by all columns False - DISTINCT is visible
Order By	True/False toggle that determines whether to add ORDER BY to SQL Query True - Sort column and order options visible False - Sort options invisible

Fact With Task Scheduling Options

If Development Mode is set to false, use Scheduling Options to configure how and when the task will run.

Task Scheduling Options

Option	Description
Scheduling Mode	Choose compute type: - Warehouse Task: User managed warehouse executes tasks - Serverless Task: Uses serverless compute
Multiple Stream has Data Flag	True / False toggle that checks whether source streams have data before executing a task. True: Only run task if source stream has capture change data False: Run task on schedule regardless of whether the source stream has data. If the source is not a stream should set this to false.
Multiple Stream has Data Logic	AND/OR logic when multiple streams (visible if Stream has Data Flag is true) AND - If there are multiple streams task will run if all streams have data OR - If there are multiple streams task will run if one or more streams has data
Select Warehouse	Visible if Scheduling Mode is set to Warehouse Task. Enter the name of the warehouse you want the task to run on without quotes.
Select initial serverless size	Visible when Scheduling Mode is set to Serverless Task. Select the initial compute size on which to run the task. Snowflake will adjust size from there based on target schedule and task run times.
Task Schedule	Choose schedule type: - Minutes - Specify interval in minutes. Enter a whole number from 1 to 11520 which represents the number of minutes between task runs. - Cron - Uses Cron expressions. Specifies a cron expression and time zone for periodically running the task. Supports a subset of standard cron utility syntax. - Predecessor - Specify dependent tasks
Enter predecessor tasks separated by a comma	Visible when Task Schedule is set to Predecessor. One or more task names that precede the task being created in the current node. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces.
Root task name	Visible when Task Schedule is set to Predecessor. Name of the root task that controls scheduling for the DAG of tasks. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces.

Fact With Task Deployment

Fact With Task Deployment Parameters

The Fact with Task includes an environment parameter that allows you to specify a different warehouse used to run a task in different environments.

The parameter name is targetTaskWarehouse and the default value is DEV ENVIRONMENT.

When set to DEV ENVIRONMENT the value entered in the Scheduling Options config Select Warehouse on which to run task will be used when creating the task.

{
    "targetTaskWarehouse": "DEV ENVIRONMENT"
}

When set to any value other than DEV ENVIRONMENT the node will attempt to create the task using a Snowflake warehouse with the specified value.

For example, with the below setting for the parameter in a QA environment, the task will execute using a warehouse named compute_wh.

{
    "targetTaskWarehouse": "compute_wh"
}

Fact With Task Initial Deployment

When deployed for the first time into an environment the Fact with Task node will execute three stages dependent on whether or not the task schedule relies on a predecessor task.

For tasks without predecessors:

Stage	Description
Create Target Table	Creates table that will be loaded by the task
Create Task	Creates task that will load the target table on schedule
Resume Task	Resumes the task so it runs on schedule

For tasks with predecessors:

Stage	Description
Create Target Table	Creates table that will be loaded by the task
Suspend Root Task	Suspends root task for DAG modification
Create Task	Creates task that will load the target table on schedule

The task node has no ALTER capabilities. All task-enabled nodes are CREATE OR REPLACE only, though this is subject to change.

Fact With Task Redeployment

After initial deployment, changes to task schedule, warehouse, or scheduling options will result in a CREATE and RESUME TASK.

For tasks without predecessors:

Stage	Description
Create Task	Recreates task with new schedule
Resume Task	Resumes task with new schedule

For tasks with predecessors:

Stage	Description
Suspend Root Task	Suspends root task for DAG modification
Create Task	Creates task that will load the target table on schedule

Fact With Task Altering the Tables

Changes to add or drop column, or change in data type will result in a ALTER, CREATE, AND RESUME TASK.

Stage	Description
Alter Table	Modifies table structure
Create Task	Recreates task
Resume Task	Resumes updated task

Fact With Task Undeployment

If a Fact with Task node is deleted from a Workspace, and that Workspace is committed to Git and that commit is deployed to a higher-level environment, then all objects created by the node in the target environment will be dropped.

For tasks without predecessors:

Stage	Description
Drop Table	Drop the table created to be loaded by the task
Drop Current Task	Removes the task

For tasks with predecessors:

Stage	Description
Drop Table	Drop the table created to be loaded by the task
Suspend Root Task	Suspends root task
Drop Task	Removes the task

If a task is part of a DAG of tasks, the DAG needs to include a node type called Task Dag Resume Root. This node will resume the root node once all the dependent tasks have been created as part of a deployment.

Task DAG Create Root

The Coalesce Task DAG Create Root UDN is a node that helps to create a standalone root task.

The root task should have a defined schedule that initiates a run of the DAG. Each of the other tasks has at least one defined predecessor to link the tasks in the DAG. A child task runs only after all of its predecessor tasks run successfully to completion.

Tasks can also be used independently to generate periodic reports by inserting or merging rows into a report table or perform other periodic work.

More information about Tasks can be found in Snowflake's Introduction to tasks.

Task DAG Create Root Node Configuration

The Task DAG Create Root node has two configuration groups:

Node Properties
Scheduling Options

Task DAG Create Root Node Properties

Property	Description
Storage Location	Storage Location where the Stream will be created
Node Type	Name of template used to create node objects
Description	A description of the node's purpose
Deploy Enabled	If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment

Task DAG Create Root Scheduling Options

Task_dag_create_root_scheduling_options

Option	Description
Scheduling Mode	Choose compute type: - Warehouse Task - User managed warehouse executes tasks - Serverless Task - Uses serverless compute
Select Warehouse	Visible if Scheduling Mode is set to Warehouse Task. Name of warehouse to run task on without quotes
Select initial serverless size	Visible when Scheduling Mode is set to Serverless Task Initial compute size for serverless tasks. Snowflake will adjust size from there based on target schedule and task run times.
Task Schedule	Choose schedule type: - Minutes - Specify interval in minutes - Cron - Use cron expression
Enter root task SQL	The SQL statement to be run when a standalone root task executes

Task DAG Create Root Deployment

Task DAG Create Root Deployment Parameters

The parameter name is targetTaskWarehouse and the default value is DEV ENVIRONMENT.

When set to DEV ENVIRONMENT the value entered in the Scheduling Options config Select Warehouse on which to run task will be used when creating the task.

{
    "targetTaskWarehouse": "DEV ENVIRONMENT"
}

When set to any value other than DEV ENVIRONMENT the node will attempt to create the task using a Snowflake warehouse with the specified value.

For example, with the below setting for the parameter in a QA environment, the task will execute using a warehouse named compute_wh.

{
    "targetTaskWarehouse": "compute_wh"
}

Task DAG Create Root Initial Deployment

When deployed for the first time into an environment, the following stages execute:

Stage	Description
Suspend Root Task	Suspends root task for creation
Create Root Task	Creates task that will execute Root SQL on schedule

Task DAG Create Root Redeployment

After the Task has deployed for the first time into a target environment, subsequent deployments will execute two stages:

Stage	Description
Suspend Root Task	Suspends root task
Create Root Task	Recreates root task

Task DAG Create Root Undeployment

When a Task DAG Create Root node is deleted, two stages are executed:

Stage	Description
Suspend Root task	Suspends root task
Drop current task	Removes the task

Task DAG Resume Root

The Coalesce Task DAG Resume Root UDN is a node type that helps to resume the root task and its dependents or child tasks. Recursively resumes all dependent tasks tied to a specified root task in a DAG using the root task name specified.

Tasks can also be used independently to generate periodic reports by inserting or merging rows into a report table or perform other periodic work.

More information about Tasks can be found in Snowflake's Introduction to tasks.

Task DAG Resume Root Node Configuration

The Task DAG Resume Root node has two configuration groups:

Node Properties
Scheduling Options

Task DAG Resume Root Node Properties

Property	Description
Storage Location	Storage Location where the Stream will be created
Node Type	Name of template used to create node objects
Description	A description of the node's purpose
Deploy Enabled	If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment

Task DAG Resume Root Scheduling Options

Task_dag_resume_root_scheduling_options

Option	Description
Enter root task name	Name of the root task to be resumed - recursively resumes all dependent tasks tied to this specified root task

Task DAG Resume Root Deployment

Task DAG Resume Root Initial Deployment

When deployed for the first time into an environment the following stage executes:

Stage	Description
Try Enable Root Task	Resumes root task and all its dependents

Task DAG Resume Root Undeployment

If a Dimension with Task node is deleted from a Workspace, that Workspace is committed to Git and that commit deployed to a higher-level environment then two stages are executed.

Stage	Description
Drop Table	Removes the table
Drop Current Task	Removes the task

Stream

The Coalesce Stream UDN is a node that allows you to develop and deploy a stream on top of a table, view or external table.

A stream logically takes an initial snapshot of every row in the source object (e.g. table, external table, or the underlying tables for a view) by initializing a point in time (called an offset) as the current transactional version of the object. The change tracking system utilized by the stream then records information about the DML changes after this snapshot was taken. Change records provide the state of a row before and after the change. Change information mirrors the column structure of the tracked source object and includes additional metadata columns that describe each change event.

More information about Streams can be found in the official Snowflake Introduction to Streams.

Stream Node Configuration

The Stream has two configuration groups:

Node Properties
Stream Options

Stream Node Properties

Property	Description
Storage Location	Storage Location where the Stream will be created
Node Type	Name of template used to create node objects
Description	A description of the node's purpose
Deploy Enabled	If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment

Stream Options

Option	Description
Source Object	Type of object for stream creation: Table: Append Only Stream: True: Append-only stream False: Standard stream Show Initial Rows: Specify the records to return the first time the stream is consumed. True: The stream returns only the rows that existed in the source object at the moment when the stream was created. Subsequently, the stream returns any DML changes to the source object since the most recent offset - the normal stream behavior. False: The stream returns any DML changes to the source object since the most recent offset. Redeployment Behavior: Options for redeployment View: Append Only Stream: True: Append-only stream False: Standard stream Show Initial Rows: Specify the records to return the first time the stream is consumed. True: The stream returns only the rows that existed in the source object at the moment when the stream was created. Subsequently, the stream returns any DML changes to the source object since the most recent offset - the normal stream behavior. False: The stream returns any DML changes to the source object since the most recent offset. Redeployment Behavior: Options for redeployment External table: Redeployment Behavior: Options for redeployment External iceberg table: Redeployment Behavior: Options for redeployment

Option

Description

Source Object

Type of object for stream creation:
Table:
Append Only Stream:
True: Append-only stream
False: Standard stream

Show Initial Rows: Specify the records to return the first time the stream is consumed.
True: The stream returns only the rows that existed in the source object at the moment when the stream was created. Subsequently, the stream returns any DML changes to the source object since the most recent offset - the normal stream behavior.
False: The stream returns any DML changes to the source object since the most recent offset.

Redeployment Behavior: Options for redeployment

View:
Append Only Stream:
True: Append-only stream
False: Standard stream

Show Initial Rows: Specify the records to return the first time the stream is consumed.
True: The stream returns only the rows that existed in the source object at the moment when the stream was created. Subsequently, the stream returns any DML changes to the source object since the most recent offset - the normal stream behavior.
False: The stream returns any DML changes to the source object since the most recent offset.

Redeployment Behavior: Options for redeployment

External table:
Redeployment Behavior: Options for redeployment

External iceberg table:
Redeployment Behavior: Options for redeployment

Stream System Columns

A Stream UDN adds three system columns to the output of the node. These columns can be used together to track INSERT, UPDATE and DELETE operations against a source object.

Column	Description
METADATA$ACTION	Indicates the DML operation (INSERT, DELETE) recorded
METADATA$ISUPDATE	Indicates whether the operation was part of an UPDATE statement. Updates to rows in the source object are represented as a pair of DELETE and INSERT records in the stream with a metadata column METADATA$ISUPDATE values set to TRUE.
METADATA$ROW_ID	Specifies the unique and immutable ID for the row, used to track changes over time

Stream Deployment

Stream Deployment Parameters

No deployment parameters are required.

Stream Initial Deployment

When deployed for the first time into an environment the Stream node executes:

Stage	Description
Create Stream	Executes a CREATE OR REPLACE statement to create a Stream in the target environment

Stream Redeployment

After initial deployment, subsequent deployments will create a new stream based on the selected redeployment behavior:

Redeployment Behavior	Stage Executed
Create Stream if not exists	Re-Create Stream at existing offset
Create or Replace	Create Stream
Create at existing stream	Re-Create Stream at existing offset

Stream Undeployment

When a Stream Node is deleted from a Workspace and that commit is deployed, the following stage executes:

Stage	Description
Drop Stream	Removes the stream from the target environment

Stream and Insert or Merge

The Coalesce Streams and Insert or Merge UDN is a node that allows you to develop and deploy a stream on top of a table, view or external table. Also, provides option to create a target table to insert or merge data from source with a task on top of it.

More information about Streams can be found in the official Snowflake's Introduction to Streams.

Stream and Insert or Merge Node Configuration

The Stream and Insert or Merge node has the following configuration groups:

Node Properties
General Options
Stream Options
Target Loading Options
Scheduling Options

Stream and Insert or Merge Node Properties

Property	Description
Storage Location	Storage Location where the Stream will be created
Node Type	Name of template used to create node objects
Description	A description of the node's purpose
Deploy Enabled	If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment

Stream and Insert or Merge General Options

Option	Description
Development Mode	True / False toggle that determines whether a task will be created or if SQL executes as DML True - Table created and SQL executes as Run action False - SQL wrapped in task with specified Scheduling Options
CREATE AS	Choose target object type: - Table - Permanent table with data retention and fail-safe - Transient Table - Temporary table without data retention
DISTINCT	True/False toggle for DISTINCT in SQL Query True - Group by All invisible, DISTINCT used False - Group by All visible
GROUP BY ALL	True/False toggle for GROUP BY ALL in SQL Query True - DISTINCT invisible, group by all columns False - DISTINCT visible

Stream and Insert or Merge Stream Options

Option	Description
Source Object	Type of object for stream creation: Table: - Append Only Stream: True/False toggle for stream type - Show Initial Rows: True/False toggle for initial records - Redeployment Behavior: Options for redeployment View: - Append Only Stream: True/False toggle for stream type - Show Initial Rows: True/False toggle for initial records - Redeployment Behavior: Options for redeployment

Stream and Insert or Merge Target Loading Options

Option	Description
Load Type	Choose data loading method: Insert - Data inserted from source Merge - Latest record changes merged into target
Table keys	Business key columns for merging data (enabled for Merge load type)
Record Date/Timestamp	Date/Timestamp column for latest record merging (enabled for Merge load type)
Cluster key	True/False toggle for clustering True - Specify clustering column and expressions. - Allow Expressions Cluster Key: Add an expression to the specified cluster key. False - No clustering

Stream and Insert or Merge Scheduling Options

Option	Description
Scheduling Mode	Choose compute type: - Warehouse Task: User managed warehouse executes tasks - Serverless Task: Uses serverless compute
When Source Stream has Data Flag	True/False toggle to check for stream data True - Only run task if source stream has capture change data False - Run task on schedule regardless of whether the source stream has data. If the source is not a stream should set this to false.
Select Warehouse	Visible if Scheduling Mode is set to Warehouse Task. Enter the name of the warehouse you want the task to run on without quotes.
Select initial serverless size	Visible when Scheduling Mode is set to Serverless Task. Select the initial compute size on which to run the task. Snowflake will adjust size from there based on target schedule and task run times.
Task Schedule	Choose schedule type: - Minutes - Specify interval in minutes. Enter a whole number from 1 to 11520 which represents the number of minutes between task runs. - Cron - Uses Cron expressions. Specifies a cron expression and time zone for periodically running the task. Supports a subset of standard cron utility syntax. - Predecessor - Specify dependent tasks
Enter predecessor tasks separated by a comma	Visible when Task Schedule is set to Predecessor. One or more task names that precede the task being created in the current node. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces.
Root task name	Visible when Task Schedule is set to Predecessor. Name of the root task that controls scheduling for the DAG of tasks. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces.

Stream and Insert or Merge System Columns

Column	Description
METADATA$ACTION	Indicates the DML operation (INSERT, DELETE) recorded
METADATA$ISUPDATE	Indicates whether operation was UPDATE (shown as DELETE/INSERT pair with TRUE value)
METADATA$ROW_ID	Unique and immutable row ID for change tracking

Stream and Insert or Merge Deployment

Stream and Insert or Merge Deployment Parameters

The Dimension with Task includes an environment parameter that allows you to specify a different warehouse used to run a task in different environments.

The parameter name is targetTaskWarehouse with default value DEV ENVIRONMENT.

{
    "targetTaskWarehouse": "DEV ENVIRONMENT"
}

When set to any value other than DEV ENVIRONMENT the node will attempt to create the task using a Snowflake warehouse with the specified value.

For example, with the below setting for the parameter in a QA environment, the task will execute using a warehouse named compute_wh.

{
    "targetTaskWarehouse": "compute_wh"
}

Stream and Insert or Merge Initial Deployment

For tasks without predecessors:

Stage	Description
Create Stream	Creates Stream in target environment
Create Work Table/Transient Table	Creates table loaded by task
Target Table Initial Load	Loads initial data
Create Task	Creates scheduled task
Resume Task	Enables task execution

For tasks with predecessors:

Stage	Description
Create Stream	Creates Stream in target environment
Create Work Table/Transient Table	Creates target table
Target Table Initial Load	Loads initial data
Suspend Root Task	Suspends root task
Create Task	Creates scheduled task

The task node has no ALTER capabilities. All task-enabled nodes are CREATE OR REPLACE only, though this is subject to change

Stream and Insert or Merge Redeployment

Stream redeployment behavior:

Redeployment Behavior	Stage Executed
Create Stream if not exists	Re-Create Stream at existing offset
Create or Replace	Create Stream
Create at existing stream	Re-Create Stream at existing offset

Table changes execute:

Stage	Description
Rename Table/Alter Column/Delete Column/Add Column/Edit description	Alters table as needed
Target Initial Load	If the initial load toggle is enabled and the redeployment behavior of the stream is "Create or Replace," it loads the table with "INSERT OVERWRITE INTO." For all other scenarios, it uses "INSERT INTO."

If the materialization type is changed from one type to another(transient table/table) the following stages execute:

Stage	Description
Drop Table/Transient Table	Drop the target table
Create Work/Transient table	Create the target table
Target Initial Load	If the initial load toggle is enabled and the redeployment behavior of the stream is "Create or Replace," it loads the table with "INSERT OVERWRITE INTO." For all other scenarios, it uses "INSERT INTO."

Task changes:

Stage	Description
Create Task	Creates scheduled task
Resume Task	Resumes the task

Stream and Insert or Merge Undeployment

When node is deleted, the following stages execute:

Stage	Description
Drop Stream	Removes the stream
Drop Table	Drop the table
Drop Current Task	Drop the task

Iceberg Tables With Task

The Iceberg Tables with Task node is an external Iceberg table node wrapped with task functionality.

An Iceberg table uses the Apache Iceberg open table format specification, which provides an abstraction layer on data files stored in open formats. Iceberg tables for Snowflake combine the performance and query semantics of regular Snowflake tables with external cloud storage that you manage. They are ideal for existing data lakes that you cannot, or choose not to, store in Snowflake.

An Iceberg table that uses an external catalog provides limited Snowflake platform support with read-only access. With this table type, Snowflake uses a catalog integration to retrieve information about your Iceberg metadata and schema.

Iceberg Tables With Task Prerequisites

The Role in the Workspace and Environment properties of Coalesce should be 'ACCOUNTADMIN' in order to successfully create an Iceberg table. You can also grant SYSADMIN roles to EXTERNAL VOLUME, CATALOG INTEGRATION created.
An EXTERNAL VOLUME, CATALOG INTEGRATION is expected to be created in Snowflake at the Storage Location chosen in the Node properties.

Iceberg Tables With Task Node Configuration

The Iceberg Tables with Task node has three configuration groups:

Node Properties
Iceberg Options
Scheduling Options

Iceberg Tables With Task Node Properties

Property	Description
Storage Location	Storage Location where the Dynamic Table will be created
Node Type	Name of template used to create node objects
Description	A description of the node's purpose
Deploy Enabled	If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment

Iceberg Tables With Task Options

Option	Description
Type of Catalog	Specify catalog type: - AWS Glue - Object Storage
Snowflake EXTERNAL VOLUME name	Identifier for external volume storing metadata files and data. .External volume needs to be created in Snowflake as a prerequisite.
Catalog integration	Identifier for catalog integration
Catalog namespace	Namespace for AWS Glue Data Catalog source (for AWS Glue)
Catalog table name	Name of catalog table (for AWS Glue)
Metadata filepath	Relative path of Iceberg metadata file for column definitions (for Object Storage)
Schedule refresh	True/False toggle for task creation True - Setting Schedule refresh Mode to true will wrap the SQL statement in a task with options specified in Scheduling Options. False - A table will be created and SQL will execute as a Run action.

Iceberg Tables With Task Scheduling Options

Option	Description
Scheduling Mode	Choose compute type: - Warehouse Task: User managed warehouse executes tasks - Serverless Task: Uses serverless compute
Select Warehouse	Visible if Scheduling Mode is set to Warehouse Task. Enter the name of the warehouse you want the task to run on without quotes.
Select initial serverless size	Visible when Scheduling Mode is set to Serverless Task. Select the initial compute size on which to run the task. Snowflake will adjust size from there based on target schedule and task run times.
Task Schedule	Choose schedule type: - Minutes - Specify interval in minutes. Enter a whole number from 1 to 11520 which represents the number of minutes between task runs. - Cron - Uses Cron expressions. Specifies a cron expression and time zone for periodically running the task. Supports a subset of standard cron utility syntax. - Predecessor - Specify dependent tasks
Enter predecessor tasks separated by a comma	Visible when Task Schedule is set to Predecessor. One or more task names that precede the task being created in the current node. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces.
Root task name	Visible when Task Schedule is set to Predecessor. Name of the root task that controls scheduling for the DAG of tasks. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces.

Iceberg Tables With Task System Columns

Column	Description
DATA	Column added for deployment but not added to Iceberg table as columns specifications are not required

Iceberg Tables With Task Deployment

Iceberg Tables With Task Deployment Parameters

The Iceberg tables With Task includes an environment parameter that allows you to specify a different warehouse used to run a task in different environments.

The parameter name is targetTaskWarehouse and the default value is DEV ENVIRONMENT.

When set to DEV ENVIRONMENT the value entered in the Scheduling Options config Select Warehouse on which to run the task will be used when creating the task.

{
    "targetTaskWarehouse": "DEV ENVIRONMENT"
}

When set to any value other than DEV ENVIRONMENT the node will attempt to create the task using a Snowflake warehouse with the specified value.

For example, with the below setting for the parameter in a QA environment, the task will execute using a warehouse named compute_wh.

{
    "targetTaskWarehouse": "compute_wh"
}

Iceberg Tables With Task Initial Deployment

For tasks without predecessors:

Stage	Description
Create/Replace Iceberg Table	Create Iceberg Table in target environment
Create Task	Creates scheduled task
Resume Task	Resume task

For tasks with predecessors:

Stage	Description
Create/Replace Iceberg Table	Create Iceberg Table in target environment
Suspend Root Task	Suspends root task
Create Task	Creates scheduled task

The task node has no ALTER capabilities. All task-enabled nodes are CREATE OR REPLACE only, though this is subject to change.

Iceberg Tables With Task Redeployment

Changes in configuration options will execute a CREATE or REPLACE. Changes such as:

Volume
Base location
Node properties
Columns

Stage	Description
Create Iceberg Table	Recreates table with new configuration

Iceberg tables With Task Recreating the Task Redeployment

Changes such as task schedule, warehouse, or scheduling options will result in a CREATE TASK AND RESUME TASK statements being issued.

For tasks without predecessors:

Stage	Description
Create Task	Recreates task with new schedule
Resume Task	Resumes updated task

For tasks with predecessors:

Stage	Description
Suspend Root Task	Suspends root task
Create Task	Creates scheduled task

Iceberg Tables With Task Undeployment

If a Snowflake Iceberg Table with a task is dropped from the workspace and committed to Git, it results in the table and task being dropped from the target environment.

For tasks without predecessors:

Stage	Description
Drop Iceberg Table	Drop the table
Drop Current Task	Drop the task

For tasks with predecessors:

Stage	Description
Drop Iceberg Table	Drop the table
Suspend Root Task	Suspends root task
Drop Task	Removes the task

Code

Work with Task Code

Dimension with Task

Fact with Task

Task DAG Create Root Code

Task DAG Resume Root Code

Stream Code

Stream and Insert or Merge Code

Macros

Versions

Available versions of the package.

Version #	Release Date	Notes
1.1.5	November 08, 2024	`Fix for typo error and single namespace variable initialization`
1.1.4	July 31, 2024	`New node type Iceberg table with Task added`
1.1.3	July 29, 2024	`Fix for empty current storage location`

Support

If you need help, please see our support section or contact us.

Overview​

Installation​

Description​

Streams and Tasks Package

Work with Task​

Work with Task Node Configuration​

Work with Task Properties​

Work with Task Options​

Work with Task General Options​

Work with Task Scheduling Options​

Example of Serverless Task with Multiple Predecessors and Root Task​

Example of Warehouse Task With 60 Minute Task Schedule​

Example of Warehouse Task With Cron Schedule Not Using a Stream​

Work With Task Deployment​

Work With Task Deployment Parameters​

Work With Task Initial Deployment​

Work With Task Redeployment​

Work With Task Altering Tables​

Work With Task Undeployment​

Dimension With Task​

Dimension With Task Node Configuration​

Dimension With Task Node Properties​

Dimension With Task Options​

Dimension With Task General Options​

Dimension With Task Scheduling Options​

Dimension With Task Deployment​

Dimenson With Task Deployment Parameters​

Dimension With Task Initial Deployment​

Dimension With Task Redeployment​

Dimension With Task Altering Tables​

Dimension With Task Undeployment​

Fact With Task​

Fact With Task Node Configuration​

Fact With Task Node Properties​

Fact With Task Options​

Fact With Task General Options​

Fact With Task Scheduling Options​

Fact With Task Deployment​

Fact With Task Deployment Parameters​

Fact With Task Initial Deployment​

Fact With Task Redeployment​

Fact With Task Altering the Tables​

Fact With Task Undeployment​

Task DAG Create Root​

Task DAG Create Root Node Configuration​

Task DAG Create Root Node Properties​

Task DAG Create Root Scheduling Options​

Task DAG Create Root Deployment​

Task DAG Create Root Deployment Parameters​

Task DAG Create Root Initial Deployment​

Task DAG Create Root Redeployment​

Task DAG Create Root Undeployment​

Task DAG Resume Root​

Task DAG Resume Root Node Configuration​

Task DAG Resume Root Node Properties​

Task DAG Resume Root Scheduling Options​

Task DAG Resume Root Deployment​

Task DAG Resume Root Initial Deployment​

Task DAG Resume Root Undeployment​

Stream​

Stream Node Configuration​

Stream Node Properties​

Stream Options​

Stream System Columns​

Stream Deployment​

Stream Deployment Parameters​

Stream Initial Deployment​

Stream Redeployment​

Stream Undeployment​

Stream and Insert or Merge​

Stream and Insert or Merge Node Configuration​

Stream and Insert or Merge Node Properties​

Stream and Insert or Merge General Options​

Stream and Insert or Merge Stream Options​

Stream and Insert or Merge Target Loading Options​

Stream and Insert or Merge Scheduling Options​

Stream and Insert or Merge System Columns​

Stream and Insert or Merge Deployment​

Stream and Insert or Merge Deployment Parameters​

Stream and Insert or Merge Initial Deployment​

Overview

Installation

Description

Work with Task

Work with Task Node Configuration

Work with Task Properties

Work with Task Options

Work with Task General Options

Work with Task Scheduling Options

Example of Serverless Task with Multiple Predecessors and Root Task

Example of Warehouse Task With 60 Minute Task Schedule

Example of Warehouse Task With Cron Schedule Not Using a Stream

Work With Task Deployment

Work With Task Deployment Parameters

Work With Task Initial Deployment

Work With Task Redeployment

Work With Task Altering Tables

Work With Task Undeployment

Dimension With Task

Dimension With Task Node Configuration

Dimension With Task Node Properties

Dimension With Task Options

Dimension With Task General Options

Dimension With Task Scheduling Options

Dimension With Task Deployment

Dimenson With Task Deployment Parameters

Dimension With Task Initial Deployment

Dimension With Task Redeployment

Dimension With Task Altering Tables

Dimension With Task Undeployment

Fact With Task

Fact With Task Node Configuration

Fact With Task Node Properties

Fact With Task Options

Fact With Task General Options

Fact With Task Scheduling Options

Fact With Task Deployment

Fact With Task Deployment Parameters

Fact With Task Initial Deployment

Fact With Task Redeployment

Fact With Task Altering the Tables

Fact With Task Undeployment

Task DAG Create Root

Task DAG Create Root Node Configuration

Task DAG Create Root Node Properties

Task DAG Create Root Scheduling Options

Task DAG Create Root Deployment

Task DAG Create Root Deployment Parameters

Task DAG Create Root Initial Deployment

Task DAG Create Root Redeployment

Task DAG Create Root Undeployment

Task DAG Resume Root

Task DAG Resume Root Node Configuration

Task DAG Resume Root Node Properties

Task DAG Resume Root Scheduling Options

Task DAG Resume Root Deployment

Task DAG Resume Root Initial Deployment

Task DAG Resume Root Undeployment

Stream

Stream Node Configuration

Stream Node Properties

Stream Options

Stream System Columns

Stream Deployment

Stream Deployment Parameters

Stream Initial Deployment

Stream Redeployment

Stream Undeployment

Stream and Insert or Merge

Stream and Insert or Merge Node Configuration

Stream and Insert or Merge Node Properties

Stream and Insert or Merge General Options

Stream and Insert or Merge Stream Options

Stream and Insert or Merge Target Loading Options

Stream and Insert or Merge Scheduling Options

Stream and Insert or Merge System Columns

Stream and Insert or Merge Deployment

Stream and Insert or Merge Deployment Parameters

Stream and Insert or Merge Initial Deployment

Stream and Insert or Merge Redeployment