Deferred Merge

Latest Version:: 1.1.4

Release Date:: March 17, 2025

Package ID:: @coalesce/deferred-merge

Coalesce Certified Package

Deferred
Stream
Task

Overview

Deferred Merge package nodes are a simplified implementation of a Lambda architecture, focusing on the integration of an ongoing real-time or near real-time updates with previously loaded data.

Installation

Copy the Package ID @coalesce/deferred-merge
In Coalesce, open the Workspace where you wish to install the package.
Go to the Build Setting of the Workspace, tab Packages, and click the Install button on the top right of the page.
Paste the Package ID, and proceed with the installation process.

Description

Deferred Merge Package

The Deferred Merge Package includes mechanisms for handling merge operations with different streaming strategies:

Deferred Merge Append Stream
Deferred Merge Delta Stream

Deferred Merge Append Stream

The Deferred Merge - Append Stream Node includes several key steps to ensure efficient and up-to-date data processing. First, a stream is created to capture row inserts. Then a target table is created and initially loaded with data. A hybrid view is established to provide access to the most current data by combining initial updates.

Finally, a scheduled task manages ongoing updates by merging changes into the target table. This process ensures that the target table remains synchronized with the source data maintaining accuracy and timeliness.

Deferred Merge Append Stream Node Configuration

The Deferred Merge Append node has the following configuration groups:

General Options
Stream Options
Target Loading Options
Target Row DML Operations
Target Delete Options
Target Clustering Options
Scheduling Options

Append Stream General Options

Append General Options

Option	Description
Development Mode	True/False toggle that determines task creation: - True: Table created and SQL executes as Run action - False: SQL wraps into task with Scheduling Options
Create Target As	Choose target object type: - Table: Creates table - Transient Table: Creates transient table

Prior to creating a task, it is helpful to test the SQL the task will execute to make sure it runs without errors and returns the expected data.

Append Stream Options

Option	Description
Source Object	Type of object for stream creation (Required): - Table - View
Show Initial Rows	True: Returns only rows that existed when stream created False: Returns DML changes since most recent offset
Redeployment Behavior	Determines stream recreation behavior

Redeployment Behavior	Stage Executed
Create Stream if not exists	Re-Create Stream at existing offset
Create or Replace	Create Stream
Create at existing stream	Re-Create Stream at existing offset

Append Stream Target Loading Options

Append Target Loading Options

Option	Description
Table keys	Business keys columns for merging into target table
Record Versioning	Date Time or Date and Timestamp column for latest record tracking

Append Stream Target Row DML Operations

Append Target Row DML Operations

Option	Description
Column Identifier	Column identifying DML operations
Include Value for Update	For records flagged under Update, the existing records in the target table are updated with the corresponding values from the source table.
Insert Value	It indicates that the corresponding record is meant to be inserted into the target table. This condition ensures that only records flagged for insertion are actually inserted into the target table during the merge operation.
Delete Value	This value indicates that the corresponding record should either be soft-deleted (if the condition is met by enabling the soft delete toggle) or hard-deleted from the target table.

Append Stream Target Delete Options

Append Target Delete Options

Option	Description
Soft Delete	Toggle to maintain deleted data record
Retain Last Non-Deleted Values	Preserves most recent non-deleted record, even as other records are marked as deleted or become inactive.

Append Stream Target Clustering Options

Append Target Clustering Options

Option	Description
Cluster key	True: Specify clustering column and allow expressions False: No clustering implemented
Allow Expressions Cluster Key	Aadd an expression to the specified cluster key

Append Stream Scheduling Options

If development mode is set to false then Scheduling Options can be used to configure how and when the task will run.

Option	Description
Scheduling Mode	Choose compute type: - Warehouse Task: User managed warehouse executes tasks - Serverless Task: Uses serverless compute
When Source Stream has Data Flag	True/False toggle to check for stream data True - Only run task if source stream has capture change data False - Run task on schedule regardless of whether the source stream has data. If the source is not a stream should set this to false.
Select Warehouse	Visible if Scheduling Mode is set to Warehouse Task. Enter the name of the warehouse you want the task to run on without quotes.
Select initial serverless size	Visible when Scheduling Mode is set to Serverless Task. Select the initial compute size on which to run the task. Snowflake will adjust size from there based on target schedule and task run times.
Task Schedule	Choose schedule type: - Minutes - Specify interval in minutes. Enter a whole number from 1 to 11520 which represents the number of minutes between task runs. - Cron - Uses Cron expressions. Specifies a cron expression and time zone for periodically running the task. Supports a subset of standard cron utility syntax. - Predecessor - Specify dependent tasks
Enter predecessor tasks separated by a comma	Visible when Task Schedule is set to Predecessor. One or more task names that precede the task being created in the current node. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces.
Root task name	Visible when Task Schedule is set to Predecessor. Name of the root task that controls scheduling for the DAG of tasks. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces.

Append Stream Limitations

🚧 Appyling Transformation This node can't apply transformations to the columns for this node type.

Append Stream Deployment

Append Stream Parameters

It includes an environment parameter that allows you to specify a different warehouse to run a task in different environments. The parameter name is targetTaskWarehouse, and the default value is DEV ENVIRONMENT.

When set to DEV ENVIRONMENT, the value entered in the Scheduling Options config "Select Warehouse on which to run the task" will be used when creating the task.

{"targetTaskWarehouse": "DEV ENVIRONMENT"}

When set to any value other than DEV ENVIRONMENT, the node will attempt to create the task using a Snowflake warehouse with the specified value. For example, with the setting below for the parameter in a QA environment, the task will execute using a warehouse named compute_wh.

{"targetTaskWarehouse": "compute_wh"}

Append Stream Initial Deployment

When deployed for the first time into an environment, executes the following stages:

Stage	Description
Create Stream	Executes CREATE OR REPLACE statement to create Stream in target environment
Create Target Table	Creates destination table for processed data storage
Target Table Initial Load	Populates the target table with the existing data from the source object. This step ensures that the target table starts with a complete set of data before any further changes are applied
Create Hybrid View	Provides access to the most up-to-date data by combining the initial data load with any subsequent changes captured by the stream. The hybrid view ensures that users always have access to the latest version of the data without waiting for batch updates.
Create Task	Creates merge operation task for stream changes
Resume Task	Activates the created task for scheduled execution
Apply Table Clustering	Alters table with cluster key if enabled
Resume Recluster Clustering	Enables periodic reclustering to maintain optimal clustering

Append Stream Predecessor Task Deployment

When deploying with predecessor tasks, executes:

Stage	Description
Suspend Root Task	Suspends root task to add task into DAG
Create Task	Creates task for loading target table

📘 Task DAG Note

For tasks in a DAG, include Task Dag Resume Root node type to resume root node after dependent task creation. Tasks use CREATE OR REPLACE only, with no ALTER capabilities.

Append Stream Redeployment

Redeployment Behavior	Stage Executed
Create Stream if not exists	Re-Create Stream at existing offset
Create or Replace	Create Stream
Create at existing stream	Re-Create Stream at existing offset

Append Stream Table Redeployment

The following column/table changes trigger ALTER statements:

Table name changes
Column drops
Column data type alterations
New column additions

Executes these stages:

Stage	Description
Alter Table Operations	Executes appropriate ALTER statements for schema changes
Target Initial Load	Executes load based on configuration: - If initial load enabled + "Create or Replace": Uses INSERT OVERWRITE - All other scenarios: Uses INSERT INTO

Append Stream View Redeployment

Stream or table changes trigger Hybrid View recreation.

Append Stream Task Redeployment

Stream or table changes trigger:

Task recreation
Task resumption

🚧 Redeployment Behavior

Redeployment with changes in Stream/Table/Task properties will result in execution of all steps mentioned in inital deployment.

Append Stream Undeployment

When node is deleted, executes:

Drop Stream
Drop Table/Transient Table
Drop View
Drop Current Task

Deferred Merge Delta Stream

The Deferred Merge - Delta Stream Node includes several key steps to ensure efficient and up-to-date data processing. First, a stream is created to capture changes from the source object to tracks all DML changes to the source object, including inserts, updates, and deletes. Then a target table is created and initially loaded with data. A hybrid view is established to provide access to the most current data by combining initial updates.

Deferred Merge Delta Stream Node Configuration

The Deferred Merge Append node has the following configuration groups:

General Options
Stream Options
Target Loading Options
Target Row DML Operations
Target Delete Options
Target Clustering Options
Scheduling Options

Delta Stream Stream General Options

Append General Options

Option	Description
Development Mode	True/False toggle that determines task creation: - True: Table created and SQL executes as Run action - False: SQL wraps into task with Scheduling Options
Create Target As	Choose target object type: - Table: Creates table - Transient Table: Creates transient table

Prior to creating a task, it is helpful to test the SQL the task will execute to make sure it runs without errors and returns the expected data.

Delta Stream Options

Append Stream Options

Option	Description
Source Object	Type of object for stream creation (Required): - Table - View
Show Initial Rows	True: Returns only rows that existed when stream created False: Returns DML changes since most recent offset
Redeployment Behavior	Determines stream recreation behavior

Redeployment Behavior	Stage Executed
Create Stream if not exists	Re-Create Stream at existing offset
Create or Replace	Create Stream
Create at existing stream	Re-Create Stream at existing offset

Delta Stream Target Loading Options

Append Target Loading Options

Option	Description
Table keys	Business keys columns for merging into target table
Record Versioning	Date Time or Date and Timestamp column for latest record tracking

Delta Stream Target Row DML Operations

Append Target Row DML Operations

Option	Description
Column Identifier	Column identifying DML operations
Include Value for Update	For records flagged under Update, the existing records in the target table are updated with the corresponding values from the source table.
Insert Value	It indicates that the corresponding record is meant to be inserted into the target table. This condition ensures that only records flagged for insertion are actually inserted into the target table during the merge operation.
Delete Value	This value indicates that the corresponding record should either be soft-deleted (if the condition is met by enabling the soft delete toggle) or hard-deleted from the target table.

Delta Stream Target Delete Options

Append Target Delete Options

Option	Description
Soft Delete	Toggle to maintain deleted data record
Retain Last Non-Deleted Values	Preserves most recent non-deleted record, even as other records are marked as deleted or become inactive.

Delta Stream Target Clustering Options

Append Target Clustering Options

Option	Description
Cluster key	True: Specify clustering column and allow expressions False: No clustering implemented
Allow Expressions Cluster Key	Aadd an expression to the specified cluster key

Delta Stream Scheduling Options

If development mode is set to false then Scheduling Options can be used to configure how and when the task will run.

Option	Description
Scheduling Mode	Choose compute type: - Warehouse Task: User managed warehouse executes tasks - Serverless Task: Uses serverless compute
When Source Stream has Data Flag	True/False toggle to check for stream data True - Only run task if source stream has capture change data False - Run task on schedule regardless of whether the source stream has data. If the source is not a stream should set this to false.
Select Warehouse	Visible if Scheduling Mode is set to Warehouse Task. Enter the name of the warehouse you want the task to run on without quotes.
Select initial serverless size	Visible when Scheduling Mode is set to Serverless Task. Select the initial compute size on which to run the task. Snowflake will adjust size from there based on target schedule and task run times.
Task Schedule	Choose schedule type: - Minutes - Specify interval in minutes. Enter a whole number from 1 to 11520 which represents the number of minutes between task runs. - Cron - Uses Cron expressions. Specifies a cron expression and time zone for periodically running the task. Supports a subset of standard cron utility syntax. - Predecessor - Specify dependent tasks
Enter predecessor tasks separated by a comma	Visible when Task Schedule is set to Predecessor. One or more task names that precede the task being created in the current node. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces.
Root task name	Visible when Task Schedule is set to Predecessor. Name of the root task that controls scheduling for the DAG of tasks. Task names are case sensitive, should not be quoted and must exist in the same schema in which the current task is being created. If there are multiple predecessor task separate the task names using a comma and no spaces.

Delta Stream Limitations

🚧 Appyling Transformation This node can't apply transformations to the columns for this node type.

Delta Stream Deployment

Delta Stream Parameters

When set to DEV ENVIRONMENT, the value entered in the Scheduling Options config "Select Warehouse on which to run the task" will be used when creating the task.

{"targetTaskWarehouse": "DEV ENVIRONMENT"}

{"targetTaskWarehouse": "compute_wh"}

Delta Stream Initial Deployment

When deployed for the first time into an environment, executes the following stages:

Stage	Description
Create Stream	Executes CREATE OR REPLACE statement to create Stream in target environment
Create Target Table	Creates destination table for processed data storage
Target Table Initial Load	Populates the target table with the existing data from the source object. This step ensures that the target table starts with a complete set of data before any further changes are applied
Create Hybrid View	Provides access to the most up-to-date data by combining the initial data load with any subsequent changes captured by the stream. The hybrid view ensures that users always have access to the latest version of the data without waiting for batch updates.
Create Task	Creates merge operation task for stream changes
Resume Task	Activates the created task for scheduled execution
Apply Table Clustering	Alters table with cluster key if enabled
Resume Recluster Clustering	Enables periodic reclustering to maintain optimal clustering

Delta Stream Predecessor Task Deployment

When deploying with predecessor tasks, executes:

Stage	Description
Suspend Root Task	Suspends root task to add task into DAG
Create Task	Creates task for loading target table

📘 Task DAG Note

For tasks in a DAG, include Task Dag Resume Root node type to resume root node after dependent task creation. Tasks use CREATE OR REPLACE only, with no ALTER capabilities.

Delta Stream Redeployment

Redeployment Behavior	Stage Executed
Create Stream if not exists	Re-Create Stream at existing offset
Create or Replace	Create Stream
Create at existing stream	Re-Create Stream at existing offset

Delta Stream Table Redeployment

The following column/table changes trigger ALTER statements:

Table name changes
Column drops
Column data type alterations
New column additions

Executes these stages:

Stage	Description
Alter Table Operations	Executes appropriate ALTER statements for schema changes
Target Initial Load	Executes load based on configuration: - If initial load enabled + "Create or Replace": Uses INSERT OVERWRITE - All other scenarios: Uses INSERT INTO

Delta Stream View Redeployment

Stream or table changes trigger Hybrid View recreation.

Delta Stream Task Redeployment

Stream or table changes trigger:

Task recreation
Task resumption

🚧 Redeployment Behavior

Redeployment with changes in Stream/Table/Task properties will result in execution of all steps mentioned in inital deployment.

Delta Stream Undeployment

When node is deleted, executes:

Drop Stream
Drop Table/Transient Table
Drop View
Drop Current Task

Redeployment with no changes

If the nodes are redeployed with no changes compared to previous deployment, then no stages are executed

Code

Deferred Merge - Append Stream

Deferred Merge - Delta Stream

Versions

Available versions of the package.

Version #	Release Date	Notes
1.1.4	March 17, 2025	`Missing Parameter Warning Added`
1.1.3	March 05, 2025	`Fixed- Removed "Nothing to do" if no stages rendered Column Desc Fix Run Template Added`
1.1.2	November 14, 2024	`Node description/comment changes`
1.1.1	August 06, 2024	`Added support for iceberg tables as source and fix for empty storage location`
1.1.0	June 14, 2024	`Deferred Merge Package updated`

Overview​

Installation​

Description​

Deferred Merge Package

Deferred Merge Append Stream​

Deferred Merge Append Stream Node Configuration​

Append Stream General Options​

Append Stream Options​

Append Stream Target Loading Options​

Append Stream Target Row DML Operations​

Append Stream Target Delete Options​

Append Stream Target Clustering Options​

Append Stream Scheduling Options​

Append Stream Limitations​

Append Stream Deployment​

Append Stream Parameters​

Append Stream Initial Deployment​

Append Stream Predecessor Task Deployment​

Append Stream Redeployment​

Append Stream Table Redeployment​

Append Stream View Redeployment​

Append Stream Task Redeployment​

Append Stream Undeployment​

Deferred Merge Delta Stream​

Deferred Merge Delta Stream Node Configuration​

Delta Stream Stream General Options​

Delta Stream Options​

Delta Stream Target Loading Options​

Delta Stream Target Row DML Operations​

Delta Stream Target Delete Options​

Delta Stream Target Clustering Options​

Delta Stream Scheduling Options​

Delta Stream Limitations​

Delta Stream Deployment​

Delta Stream Parameters​

Delta Stream Initial Deployment​

Delta Stream Predecessor Task Deployment​

Delta Stream Redeployment​

Delta Stream Table Redeployment​

Delta Stream View Redeployment​

Delta Stream Task Redeployment​

Delta Stream Undeployment​

Redeployment with no changes​

Code​

Deferred Merge - Append Stream​

Deferred Merge - Delta Stream​

Versions​

Overview

Installation

Description

Deferred Merge Append Stream

Deferred Merge Append Stream Node Configuration

Append Stream General Options

Append Stream Options

Append Stream Target Loading Options

Append Stream Target Row DML Operations

Append Stream Target Delete Options

Append Stream Target Clustering Options

Append Stream Scheduling Options

Append Stream Limitations

Append Stream Deployment

Append Stream Parameters

Append Stream Initial Deployment

Append Stream Predecessor Task Deployment

Append Stream Redeployment

Append Stream Table Redeployment

Append Stream View Redeployment

Append Stream Task Redeployment

Append Stream Undeployment

Deferred Merge Delta Stream

Deferred Merge Delta Stream Node Configuration

Delta Stream Stream General Options

Delta Stream Options

Delta Stream Target Loading Options

Delta Stream Target Row DML Operations

Delta Stream Target Delete Options

Delta Stream Target Clustering Options

Delta Stream Scheduling Options

Delta Stream Limitations

Delta Stream Deployment

Delta Stream Parameters

Delta Stream Initial Deployment

Delta Stream Predecessor Task Deployment

Delta Stream Redeployment

Delta Stream Table Redeployment

Delta Stream View Redeployment

Delta Stream Task Redeployment

Delta Stream Undeployment

Redeployment with no changes

Code

Deferred Merge - Append Stream

Deferred Merge - Delta Stream

Versions