external-data-package

Overview

Provides tools for loading and managing external data in Snowflake, including schema inference, data copying, and unloading, API integration, and parsing of Excel files.

Installation

  • Copy the Package ID @coalesce/external-data-package.
  • In Coalesce, open the Workspace where you wish to install the package.
  • Go to the Build Setting of the Workspace, tab Packages, and click the Install button on the top right of the page.
  • Paste the Package ID, and proceed with the installation process.

Description

External Data Package

The Coalesce External Data Package includes:

InferSchema

The Coalesce InferSchema UDN is a versatile node infers schema of the file in internal or external stage and dynamically creates the target table of the same name as inferschema node.

InferSchema in Snowflake automatically detects the file metadata schema in a set of staged data files that contain semi-structured data and retrieves the column definitions.

InferSchema Prerequisites

  • A sample file in the internal or external stage.
  • An existing fileformat to parse the file
  • The file format is also expected to be in the same location as stage

InferSchema Node Configuration

The InferSchema node type has two configuration groups:

Go to the node and select the Config tab to see the Node Properties, Dynamic Table Options and General Options.

InferSchema Node Properties

  • Storage Location: Storage Location where the WORK will be created.
  • Node Type: Name of template used to create node objects.
  • Description: A description of the node's purpose.
  • Deploy Enabled:
  • If TRUE the node will be deployed / redeployed when changes are detected.
  • If FALSE the node will not be deployed or will be dropped during redeployment.

InferSchema Source Data

  • Stage Storage Location (Required): A storage location in Coalesce where the stage is located.
  • Stage Name (Required)*: Internal or External stage where the files containing data to be loaded are staged.
  • File Names (Required): Use commas to seperate multiple files. File whose metatdata is to be inferred.
  • File Format Name (Required)*: Name of the file format object that describes the data contained in the staged files.It is expected in the same location as Stage.
  • Redeployment Behavior:
  • CREATE OR REPLACE: Dynamically creates target table based on the inferred schema from file in staging area.
  • ALTER EXISTING TABLE: Dynamically alters inferred table by comparing the inferred schema of the same file (with changes if any)and created table.
  • DROP EXISTING TABLE: Drops the table inferred.

InferSchema Example workflow

  • Add a InferSchema node, for example INFER_JSON.
  • Deploy the node. On deployment a table of the same name as the InferSchema node is created with columns inferred on parsing the file. For example, InferSchema node:INFER_JSON,Inferred table:INFER_JSON.
  • Add the inferred table to the browser using Add Sources tab.
  • Now we can add a Copy-Into Snowpipe or External table node on top of the inferred table to load staged files.
  • If the structure of the file is changed, you can redeploy the InferSchema node with "Alter existing table" redeployment behaviour. The already inferred table is altered.
  • Next, you can re-sync the columns of the inferred table and redeploy the CopyInto Copy-Into Snowpipe or External table node added on top of it.

🚧 Inferred table and Copy-into nodes/External table cannot be deployed together

Infer Schema node, inferred table and Copy-into nodes/External table cannot be deployed together. First deploy the Infer Schema node. Then add the inferred table in browser, add Copy-into node on top of the table and then deploy the same.

InferSchema Deployment

InferSchema Initial Deployment

When deployed for the first time into an environment the InferSchema node will execute the stage:

  • Stage executed: Infer and Create target table

InferSchema Redeployment

Redeployment Behavior: Create or Replace

| Redeployment Behavior | Stage Executed | |---|---| | Create or Replace | Infer and Create target table

If any of the Source Data options like Stage storage location, Stage name or filename are modified. Then you can redeploy the Infer Schema node with redeployment behaviour “Create or Replace”.

📘 Info

You can go back to the browser and Re-sync columns of Inferred table, re-execute Copy-Into and redeploy

Redeployment Behavior: Alter Existing Table

| Redeployment Behavior | Stage Executed | |---|---| |Alter existing table| Infer and Alter target table

If all Source Data options remain same and only there are changes in the existing file structure, you can redeploy the Infer Schema node with redeployment behaviour “Alter existing table”.

Redeployment Behavior: Drop Existing Table

| Redeployment Behavior | Stage Executed | |---|---| |Drop existing table |Drop inferred table

If you want to drop the inferred table you can redeploy the Infer Schema node with redeployment behaviour “Drop existing table”.

InferSchema Undeployment

If the InferSchema node is deleted from a Workspace, that Workspace is committed to Git and that commit deployed to a higher-level environment then no action takes place.

CopyInto

CopyInto Node Configuration

The Copy-Into node type the following configurations available:

CopyInto - Node Properties

  • Storage Location: Storage Location where the WORK will be created.
  • Node Type: Name of template used to create node objects.
  • Description: A description of the node's purpose.
  • Deploy Enabled:
  • If TRUE the node will be deployed / redeployed when changes are detected.
  • If FALSE the node will not be deployed or will be dropped during redeployment.

CopyInto - File Location

Filelocation

  • Stage Storage Location (Required): A storage location in Coalesce where the stage is located.
  • Stage Name (Required): Internal or External stage where the files containing data to be loaded are staged.
  • File Names: Specifies a list of one or more files names (separated by commas) to be loaded. For example, 'a.csv','b.csv'.
  • File Pattern:A regular expression pattern string, enclosed in single quotes, specifying the file names or paths to match. For example, *hea.*[.]csv'.

CopyInto - File Format

fileformat

  • File Format Definition - File Format Name:
  • File Format Name: Specifies an existing named file format to use for loading data into the table.
  • File Type:
    • CSV
    • JSON
    • ORC
    • AVRO
    • PARQUET
    • XML
  • File Format Definition - File Format Values:
  • File Format Values -Provides file format options for the File Type chosen.
  • File Type: Each file type has different configurations available.
    • CSV
    • Compression: String (constant) that specifies the current compression algorithm for the data files to be loaded.
    • Record delimiter:Characters that separate records in an input file
    • Field delimiter:One or more singlebyte or multibyte characters that separate fields in an input file
    • Field optionally enclosed by:Character used to enclose strings
    • Number of header lines to skip:Number of lines at the start of the file to skip.
    • Skip blank lines:Boolean that specifies to skip any blank lines encountered in the data files.
    • Trim Space: Boolean that specifies whether to remove white space from fields.
    • Replace invalid characters: Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.
    • Date format:String that defines the format of date values in the data files to be loaded.
    • Time format: String that defines the format of time values in the data files to be loaded
    • Timestamp format:String that defines the format of timestamp values in the data files to be loaded.
    • JSON
    • Compression: String (constant) that specifies the current compression algorithm for the data files to be loaded.
    • Replace invalid characters - Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.
    • Trim Space - Boolean that specifies whether to remove white space from fields.
    • Strip Outer Array:Boolean that instructs the JSON parser to remove outer brackets [ ].
    • Date format:String that defines the format of date values in the data files to be loaded.
    • Time format:String that defines the format of time values in the data files to be loaded
    • Timestamp format: String that defines the format of timestamp values in the data files to be loaded.
    • ORC
    • Trim Space - Specifies whether to remove white space from fields
    • Replace invalid characters - Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.
    • AVRO
    • Trim Space - Boolean that specifies whether to remove white space from fields.
    • Replace invalid characters - Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.
    • PARQUET
    • Trim Space - Boolean that specifies whether to remove white space from fields.
    • Replace invalid characters - Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.
    • XML
    • Replace invalid characters - Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.

CopyInto - Copy Options

  • On Error Behavior:String (constant) that specifies the error handling for the load operation.
  • CONTINUE
  • SKIP_FILE
  • SKIP_FILE_num
  • SKIP_FILE_num%
  • ABORT_STATEMENT
  • Specify the number of errors that can be skipped: Required when On Error Behavior is either SKIP_FILE_num or SKIP_FILE_num%. Specify the number of errors that can be skipped.
  • Size Limit: Number (> 0) that specifies the maximum size (in bytes) of data to be loaded for a given COPY statement.
  • Purge Behavior: Boolean that specifies whether to remove the data files from the stage automatically after the data is loaded successfully.
  • Return Failed Only: Boolean that specifies whether to return only files that have failed to load in the statement result.
  • Force: Boolean that specifies to load all files, regardless of whether they’ve been loaded previously and have not changed since they were loaded.
  • Load Uncertain Files: Boolean that specifies to load files for which the load status is unknown. The COPY command skips these files by default.
  • Enforce Length: Boolean that specifies whether to truncate text strings that exceed the target column length
  • Truncate Columns: Boolean that specifies whether to truncate text strings that exceed the target column length

CopyInto - System Columns

The set of columns which has source data and file metadata information.

  • SRC - The data from the file is loaded into this variant column.
  • LOAD_TIMESTAMP - Current timestamp when the file gets loaded.
  • FILENAME - Name of the staged data file the current row belongs to. Includes the full path to the data file.
  • FILE_ROW_NUMBER - Row number for each record in the staged data file.
  • FILE_LAST_MODIFIED - Last modified timestamp of the staged data file the current row belongs to
  • SCAN_TIME - Start timestamp of operation for each record in the staged data file. Returned as TIMESTAMP_LTZ.

CopyInto Deployment

CopyInto Deployment Parameters

The CopyInto-Snowpipe includes an environment parameter that allows you to specify if you want to perform a full load or a reload based on the load type when you are performing a Copy-Into operation.

The parameter name is loadType and the default value is ``.

json { "loadType": "" }

When the parameter value is set to Reload, the data is reloaded into the table regardless of whether they’ve been loaded previously and have not changed since they were loaded.

CopyInto Initial Deployment

When deployed for the first time into an environment the CopyInto-Snowpipe node will execute the below stage depending on if Snowpipe is enabled and the loadType.

| Deployment Behavior | Enable Snowpipe |Stages Executed| |--|--|---|--| | Initial Deployment | false | empty|Create Table.
Historical full load using CopyInto. | | Initial Deployment | false | Reload | Truncate Target Table
Reload data-Copy Into Force|

CopyInto Redeployment

Altering the CopyInto Tables

There are few column or table changes like Change in table name, Dropping existing column, Alter Column data type, Adding a new column if made in isolation or all-together will result in an ALTER statement to modify the target Table in the target environment.Any table level changes or node config changes results in recreation of pipe

The following stages are executed:

The following stages are executed:

  • Clone Table: Creates an internal table.
  • Rename Table| Alter Column | Delete Column | Add Column| Edit table description |: Alter table statement is executed to perform the alter operation.
  • Swap cloned Table: Upon successful completion of all updates, the clone replaces the main table ensuring that no data is lost.
  • Delete Table: Drops the internal table.

CopyInto Undeployment

If the CopyInto-Snowpipe node is deleted from a Workspace, that Workspace is committed to Git and that commit deployed to a higher-level environment then the target table in the target environment will be dropped.

This is executed in two stages:

  • Delete Table: Coalesce Internal table is dropped.
  • Delete Table: Target table in Snowflake is dropped.

CopyInto - Snowpipe

The Coalesce CopyInto - Snowpipe node is a node that performs two operations. It can be used to load historical data using
CopyInto. CopyInto-Snowpipe node can be used to create a pipe to auto ingest files from AWS, GCP, or Azure.

Snowpipe enables loading data from files as soon as they’re available in a stage.

This means you can load data from files in micro-batches, making it available to users within minutes, rather than manually executing COPY statements on a schedule to load larger batches.

CopyInto - Snowpipe Node Configuration

The Copy-Into Snowpipe node type the following configurations available:

Snowpipe-config

CopyInto - Snowpipe Node Properties

  • Storage Location: Storage Location where the WORK will be created.
  • Node Type: Name of template used to create node objects.
  • Description: A description of the node's purpose.
  • Deploy Enabled:
  • If TRUE the node will be deployed / redeployed when changes are detected.
  • If FALSE the node will not be deployed or will be dropped during redeployment.

CopyInto - Snowpipe Options

  • Enable Snowpipe: Toggle that helps us to load create a pipe to auto ingest files from external stage.
  • Toggle Off - Provides option to load data from files in internal or external stage. This is under File Location configuration.
  • Toggle On - Load data auto ingesting files from AWS, Azure, or GCP. Choose your Cloud Provider.
    • AWS - AWS SNS Topic. Specifies the Amazon Resource Name (ARN) for the SNS topic for your S3 bucket.
    • Azure - Integration. Specifies the existing notification integration used to access the storage queue.
    • GCP - Integration. Specifies the existing notification integration used to access the storage queue.
  • Load historical data:When the toggle is enabled auto ingest does not take place and the files already in internal or external stages are loaded

CopyInto - Snowpipe File Location

Filelocation

  • Stage Storage Location (Required): A storage location in Coalesce where the stage is located.
  • Stage Name (Required): Internal or External stage where the files containing data to be loaded are staged.
  • File Names: Enabled when 'Enable Snowpipe' under Snowpipe Options is toggled off. Specifies a list of one or more files names (separated by commas) to be loaded. For example, 'a.csv','b.csv'.
  • File Pattern:A regular expression pattern string, enclosed in single quotes, specifying the file names or paths to match. For example, *hea.*[.]csv'.

CopyInto - Snowpipe File Format

fileformat

  • File Format Definition - File Format Name:
  • File Format Name: Specifies an existing named file format to use for loading data into the table.
  • File Type:
    • CSV
    • JSON
    • ORC
    • AVRO
    • PARQUET
    • XML
  • File Format Definition - File Format Values:
  • File Format Values -Provides file format options for the File Type chosen.
  • File Type: Each file type has different configurations available.
    • CSV
    • Compression: String (constant) that specifies the current compression algorithm for the data files to be loaded.
    • Record delimiter:Characters that separate records in an input file
    • Field delimiter:One or more singlebyte or multibyte characters that separate fields in an input file
    • Field optionally enclosed by:Character used to enclose strings
    • Number of header lines to skip:Number of lines at the start of the file to skip.
    • Skip blank lines:Boolean that specifies to skip any blank lines encountered in the data files.
    • Trim Space: Boolean that specifies whether to remove white space from fields.
    • Replace invalid characters: Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.
    • Date format:String that defines the format of date values in the data files to be loaded.
    • Time format: String that defines the format of time values in the data files to be loaded
    • Timestamp format:String that defines the format of timestamp values in the data files to be loaded.
    • JSON
    • Compression: String (constant) that specifies the current compression algorithm for the data files to be loaded.
    • Replace invalid characters - Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.
    • Trim Space - Boolean that specifies whether to remove white space from fields.
    • Strip Outer Array:Boolean that instructs the JSON parser to remove outer brackets [ ].
    • Date format:String that defines the format of date values in the data files to be loaded.
    • Time format:String that defines the format of time values in the data files to be loaded
    • Timestamp format: String that defines the format of timestamp values in the data files to be loaded.
    • ORC
    • Trim Space - Specifies whether to remove white space from fields
    • Replace invalid characters - Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.
    • AVRO
    • Trim Space - Boolean that specifies whether to remove white space from fields.
    • Replace invalid characters - Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.
    • PARQUET
    • Trim Space - Boolean that specifies whether to remove white space from fields.
    • Replace invalid characters - Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.
    • XML
    • Replace invalid characters - Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.

CopyInto - Snowpipe Copy Options

If you toggle Enable Snowpipe under Snowpipe Options to ON, these configuration options are available.

  • On Error Behavior: String (constant) that specifies the error handling for the load operation.
  • CONTINUE
  • SKIP_FILE
  • SKIP_FILE_num
    • Specify the number of errors that can be skipped.
  • SKIP_FILE_num%
    • Specify the number of errors that can be skipped.
  • Enforce Length: Boolean that specifies whether to truncate text strings that exceed the target column length.
  • Truncate Columns: Boolean that specifies whether to truncate text strings that exceed the target column length.

If you toggle Enable Snowpipe under Snowpipe Options to OFF, these configuration options are available.

  • On Error Behavior:String (constant) that specifies the error handling for the load operation.
  • CONTINUE
  • SKIP_FILE
  • SKIP_FILE_num
  • SKIP_FILE_num%
  • ABORT_STATEMENT
  • Specify the number of errors that can be skipped: Required when On Error Behavior is either SKIP_FILE_num or SKIP_FILE_num%. Specify the number of errors that can be skipped.
  • Size Limit: Number (> 0) that specifies the maximum size (in bytes) of data to be loaded for a given COPY statement.
  • Purge Behavior: Boolean that specifies whether to remove the data files from the stage automatically after the data is loaded successfully.
  • Return Failed Only: Boolean that specifies whether to return only files that have failed to load in the statement result.
  • Force: Boolean that specifies to load all files, regardless of whether they’ve been loaded previously and have not changed since they were loaded.
  • Load Uncertain Files: Boolean that specifies to load files for which the load status is unknown. The COPY command skips these files by default.
  • Enforce Length: Boolean that specifies whether to truncate text strings that exceed the target column length
  • Truncate Columns: Boolean that specifies whether to truncate text strings that exceed the target column length

CopyInto - Snowpipe System Columns

The set of columns which has source data and file metadata information.

  • SRC - The data from the file is loaded into this variant column.
  • LOAD_TIMESTAMP - Current timestamp when the file gets loaded.
  • FILENAME - Name of the staged data file the current row belongs to. Includes the full path to the data file.
  • FILE_ROW_NUMBER - Row number for each record in the staged data file.
  • FILE_LAST_MODIFIED - Last modified timestamp of the staged data file the current row belongs to
  • SCAN_TIME - Start timestamp of operation for each record in the staged data file. Returned as TIMESTAMP_LTZ.

CopyInto - Snowpipe Deployment

CopyInto - Snowpipe Initial Deployment

When deployed for the first time into an environment the CopyInto-Snowpipe node will execute the below stage depending on if Snowpipe is enabled and the loadType.

| Deployment Behavior | Enable Snowpipe |Stages Executed| |--|--|---|--| | Initial Deployment | false | Create Table.
Historical full load using CopyInto. | | Initial Deployment | true | Create table
Create Pipe |

CopyInto - Snowpipe Redeployment

Altering the CopyInto-Snowpipe Tables

There are few column or table changes like Change in table name, Dropping existing column, Alter Column data type, Adding a new column if made in isolation or all-together will result in an ALTER statement to modify the target Table in the target environment.Any table level changes or node config changes results in recreation of pipe

The following stages are executed:

  • Rename Table| Alter Column | Delete Column | Add Column| Edit table description |: Alter table statement is executed to perform the alter operation.
  • Create Pipe: Pipe is recreated if enable snowpipe option is true
  • Historical full load using CopyInto:Historical data are loaded

CopyInto - Snowpipe Undeployment

If the CopyInto-Snowpipe node is deleted from a Workspace, that Workspace is committed to Git and that commit deployed to a higher-level environment then the target table in the target environment will be dropped.

This is executed in two stages:

  • Drop Table: Target table is dropped
  • Drop Pipe: Pipe is dropped

External Tables

The Coalesce External Table nodes create a new external table in the current/specified schema or replaces an existing external table.

An external table reads data from a set of one or more files in a specified external stage which can point to AWS,GCP or Azure cloud providers.

External Tables Node Configuration

The External table node type has four configuration groups:

External Tables Node Properties

  • Storage Location: Storage Location where the WORK will be created.
  • Node Type: Name of template used to create node objects.
  • Description: A description of the node's purpose.
  • Deploy Enabled:
  • If TRUE the node will be deployed / redeployed when changes are detected.
  • If FALSE the node will not be deployed or will be dropped during redeployment.

External Tables Node File Location

Filelocation

  • Stage Storage Location (Required): A storage location in Coalesce where the stage is located.
  • Stage Name (Required): Internal or External stage where the files containing data to be loaded are staged.
  • File Pattern:A regular expression pattern string, enclosed in single quotes, specifying the file names or paths to match. For example, *hea.*[.]csv'.

External Tables File Format

fileformat

  • File Format Definition - File Format Name:
  • File Format Name: Specifies an existing named file format to use for loading data into the table.
  • File Type:
    • CSV
    • JSON
    • ORC
    • AVRO
    • PARQUET
  • File Format Definition - File Format Values:
  • File Format Values -Provides file format options for the File Type chosen.
  • File Type: Each file type has different configurations available.
    • CSV
    • Compression: String (constant) that specifies the current compression algorithm for the data files to be loaded.
    • Record delimiter: Characters that separate records in an input file
    • Field delimiter: One or more singlebyte or multibyte characters that separate fields in an input file
    • Field optionally enclosed by: Character used to enclose strings
    • Number of header lines to skip: Number of lines at the start of the file to skip.
    • Skip blank lines:Boolean that specifies to skip any blank lines encountered in the data files.
    • Trim Space: Boolean that specifies whether to remove white space from fields.
    • Replace invalid characters: Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.
    • JSON
    • Compression: String (constant) that specifies the current compression algorithm for the data files to be loaded.
    • Replace invalid characters - Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.
    • Strip Outer Array:Boolean that instructs the JSON parser to remove outer brackets [ ].
    • Date format:String that defines the format of date values in the data files to be loaded.
    • ORC
    • Replace invalid characters - Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.
    • AVRO
    • Replace invalid characters - Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.
    • PARQUET
    • Replace invalid characters - Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.

External Tables Additional Options

  • Auto refresh:True/False toggle that specifies whether Snowflake should enable triggering automatic refreshes of the external table metadata when new or updated data files are available in the named external stage.
  • False - Auto refresh does not take place
  • True - Auto refresh takes place. Setting this to true, give you the option to choose a Cloud Provider from AWS, Azure, and GCP.\
    • AWS - AWS SNS Topic, Enabled only when Cloud Provider is AWS and auto refresh is true.Required only when configuring AUTO_REFRESH for Amazon S3 stages using Amazon Simple Notification Service (SNS).
    • GCP - Integration. Enabled when Cloud Provider is GCP/Azure. Specifies the existing notification integration used to access the storage queue.
    • Azure - Integration. Enabled when Cloud Provider is GCP/Azure. Specifies the existing notification integration used to access the storage queue.

System Columns

The set of columns which has source data and file metadata information.

  • VALUE - The data from the file is loaded into this variant column
  • FILENAME - Name of the staged data file the current row belongs to. Includes the full path to the data file.

External table Deployment

Initial Deployment

When deployed for the first time into an environment the External table node will execute the below stage:

  • Create External Table - This will execute a CREATE OR REPLACE statement and create a table in the target environment.

Redeployment

Recreating the External table

The subsequent deployment of External table with changes in config options will recreate the table.

The following stages are executed:

  • Create External Table

Undeployment

If the External table node is deleted from a Workspace, that Workspace is committed to Git and that commit deployed to a higher-level environment then the target table in the target environment will be dropped.

  • Drop External Table

CopyUnload

The Coalesce Copy Unload node unloads the table into internal stage or external stage location in various file formats supported by Snowflake

A Copy Unload node loads data from a table (or query) into one or more files in the internal or external location mentioned

Copy Unload Node Configuration

The Copy Unload node type has four configuration groups:

Copy Unload Node Properties

  • Storage Location: Storage Location is the location(i.e. Database and Schema) of your source table.
  • Node Type: Name of template used to create node objects.
  • Description: A description of the node's purpose.
  • Deploy Enabled:
  • If TRUE the node will be deployed / redeployed when changes are detected.
  • If FALSE the node will not be deployed or will be dropped during redeployment.

Copy Unload Node File Location

copy-unload-file

  • Stage (Required): Location in Snowflake(internal stage) or external stage or external location where the data files are unloaded.
  • Partition by (Optional): Unload operation splits the table rows based on the partition expression and determines the number of files to create based on the number of unique values in a particular column (only sinlge column name expected)
  • Allow Expressions in partition by clause:A regular expression pattern string, enclosed in single quotes, for example: ('date=' || to_varchar(dt, 'YYYY-MM-DD') || '/hour=' || to_varchar(date_part(hour, ts))- Concatenates labels and column values to output meaningful filenames

Copy Unload File Format

FileFormat

  • File Format Definition - File Format Name:
  • File Format Name: Specifies an existing named file format in Snowflake to use for unloading data into the table.
  • File Type:
    • CSV
    • JSON
    • PARQUET
  • File Format Definition - File Format Values:
  • File Format Values -Provides different file format options for the File Type chosen to unload.
  • File Type: Each file type has different configurations available.
    • CSV
      • Compression: String (constant) that specifies the current compression algorithm for the data files to be unloaded.
      • Record delimiter: Characters that separate records in an unloaded file
      • Field delimiter: One or more singlebyte or multibyte characters that separate fields in an unloaded file
      • Field optionally enclosed by: Character used to enclose strings. (Default is \042)
      • File Extension: String that specifies the extension for files unloaded to a stage. Accepts any extension.
      • Date Format: String that defines the format of date values in the unloaded data files.(Default is AUTO)
      • Time Format: String that defines the format of time values in the unloaded data files.(Default is AUTO)
      • Timestamp Format: String that defines the format of timestamp values in the unloaded data files.(Default is AUTO)
    • JSON
      • Compression: String (constant) that specifies the current compression algorithm for the data files to be unloaded.
      • File extension - Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.
    • PARQUET
      • Compression: String (constant) that specifies the current compression algorithm for the data files to be unloaded.

Copy Unload Copy Options

CopyOptions

  • Overwrite Flag: Toogle overwrites existing files with matching names, if any, in the location where files are unloaded
  • Header:
  • Single File Flag: Toggle generates a single file when true, else generates multiple files if partition by enabled
  • Max File Size (MB): Number (> 0) that specifies the upper size limit (in bytes) of each file to be generated in parallel per thread. Note that the actual file size and number of files unloaded are determined by the total amount of data and number of nodes available for parallel processing.
  • Include Query ID: When true provides unique identifier for unloaded files by including a universally unique identifier (UUID) in the filenames of unloaded data files

  • Detailed Output: When true includes a row for each file unloaded to the specified stage. Columns show the path and name for each file, its size, and the number of rows that were unloaded to the file.

### Copy Unload Deployment

Deployment not applicable for this node

API

The Coalesce API Node loads external data which is external to snowflake with the help of EXTERNAL ACCESS INTEGRATION Name which has a network rule which allows access to external network locations using procedure handler

API Node Configuration

The API Node Configuration type has two configuration groups:

API Node Properties

  • Storage Location: Storage Location is the location of the destination where retrieved data of API calls are stored
  • Node Type: Name of template used to create node objects.
  • Description: A description of the node's purpose.
  • Deploy Enabled:
  • If TRUE the node will be deployed / redeployed when changes are detected.
  • If FALSE the node will not be deployed or will be dropped during redeployment.

API Node Options

Options

  • Snowflake EXTERNAL ACCESS INTEGRATION Name (Required): EXTERNAL ACCESS INTEGRATION Name has a network rule which allows access to external network locations external to snowflake using procedure
  • Method(Required): HTTP Request methods Get,Put,Post
  • Get: API Call to retrieve data
  • Put: API Call to update existing data
  • Post: API Call to create new data

  • URI(Required): Uniform Resoure Identifier to be provided to locate and interact with resources within a specific API

  • Headers: headers are used to provide additional context and information about the request
  • Payload: when making HTTP requests to a URI, the request may include a payload (also known as a body)
  • Note:: Payload applicable only to Put & Post API method calls

JDBC LOAD

The Coalesce JDBC Load Node allows to connect, interact, and retrieve data from various database management systems into Snowflake

JDBC Load Node Configuration

The JDBC Load Node Configuration type has two configuration groups:

JDBC Node Properties

  • Storage Location: Storage Location is the location of the destination where retrieved from external database are stored
  • Node Type: Name of template used to create node objects.
  • Description: A description of the node's purpose.
  • Deploy Enabled:
  • If TRUE the node will be deployed / redeployed when changes are detected.
  • If FALSE the node will not be deployed or will be dropped during redeployment.

JDBC Load Options

JDBC_load_options

  • JDBC Driver(Required): Required to establish the connection to external databse
  • JAR file external stage location(Required): Location of the external stage where JAR file is located
  • JAR file name(Required): Name of JAR file located in external stage
  • EXTERNAL ACCESS INTEGRATION(Required): EXTERNAL ACCESS INTEGRATION Name has a network rule which allows access to external network locations external to snowflake using procedure
  • External database credentials: Username, Password of the external Database in the form snowflake secret object
  • JDBC URL(Required): URL of the database with port number (ex: "sqlserver-tests.database.windows.net:1433" )
  • SQL(Required): SQL query to retrieve the required data
  • Truncate Before:

Parse Excel

The Coalesce Parse Excel node parses an large excel file in stage location to json format on the target location

Parse Excel Node Configuration

The Parse Excel node type has three configuration groups:

Parse Excel Node Properties

  • Storage Location: Storage Location is the location of the destination where source table is parsed.
  • Node Type: Name of template used to create node objects.
  • Description: A description of the node's purpose.
  • Deploy Enabled:
  • If TRUE the node will be deployed / redeployed when changes are detected.
  • If FALSE the node will not be deployed or will be dropped during redeployment.

Parse Excel Stored Procedure Options

StoredProcedureOptions

  • Stored Procedure Storage Location(Required): Location in Snowflake to be provided where Stored Procedure will be loaded by script
  • Note: Stored Procedure location and Storage Location (i.e. destination) must be in same Schema

Excel File Processing Options

ExcelFileProcessingOptions

  • Source File Storage Location(REQUIRED): Storage location name in Coalesce which points to the database and schema where stage is located (source file has to be located under stages)
  • Source File Stage(REQUIRED): Stage name in Snowflake where source file is located
  • Source File Name(REQUIRED): Excel filename to be parsed

Initial Deployment

When deployed for the first time into an environment the Parse Excel node will execute the below stage:

  • Create OR Replace Table - This will execute a CREATE OR REPLACE statement and create a table in the target environment.
  • Create OR Replace Stored Procedure - This will create a Stored Procedure in the target environment

Redeployment

Recreating the Parse Excel

There are few column or table changes like Change in table name, Dropping existing column, Alter Column data type, Adding a new column if made in isolation or all-together will result in an ALTER statement to modify the target Table in the target environment.Any table level changes or node config changes results in recreation of Stored Procedure

The following stages are executed:

  • Rename Table| Alter Column | Delete Column | Add Column| Edit table description - Alter table statement is executed to perform the alter operation.
  • Create OR Replace Stored Procedure - This will create a Stored Procedure in the specified target environment

Undeployment

If the Parse Excel node is deleted from a Workspace, that Workspace is committed to Git and that commit deployed to a higher-level environment then the target table in the target environment will be dropped.

  • Drop Table
  • Drop Procedure

Parse Json

The Coalesce parses large json file containing json array to the provided target location

Parse Json Node Configuration

The Parse Excel node type has three configuration groups:

Parse json Node Properties

  • Storage Location: Storage Location is the location of the destination where source table is parsed to.
  • Node Type: Name of template used to create node objects.
  • Description: A description of the node's purpose.
  • Deploy Enabled:
  • If TRUE the node will be deployed / redeployed when changes are detected.
  • If FALSE the node will not be deployed or will be dropped during redeployment.

Parse JSON UDTF Procedure Options

UDTFProcedureOptions

  • User Defined Table Function Location (Required): Location in Snowflake to be provided where UDTF will be loaded by script
  • Note: UDTF location and Storage Location (i.e. destination) must be in same Schema

Parse JSON JSON File Processing Options

JSONFileProcessingOptions

  • Source File Storage Location(REQUIRED): Schema name in Snowflake where source file is located (source file has to be located under stages)
  • Source File Stage(REQUIRED): Stage name in Snowflake where source file is located
  • Source File Name(REQUIRED): Json filename to be parsed
  • JSON Array Name(REQUIRED): Array name of the json

Initial Deployment

When deployed for the first time into an environment the Parse Json node will execute the below stage:

  • Create OR Replace Table - This will execute a CREATE OR REPLACE statement and create a table in the target environment.
  • Create OR Replace Function - This will create a UDTF function in the target environment

Redeployment

Recreating the Parse Json

There are few column or table changes like Change in table name, Dropping existing column, Alter Column data type, Adding a new column if made in isolation or all-together will result in an ALTER statement to modify the target Table in the target environment.Any table level changes or node config changes results in recreation of Stored Procedure

The following stages are executed:

  • Rename Table| Alter Column | Delete Column | Add Column| Edit table description - Alter table statement is executed to perform the alter operation.
  • Create OR Replace Stored Procedure - This will create a Stored Procedure in the specified target environment

Undeployment

If the Parse Json node is deleted from a Workspace, that Workspace is committed to Git and that commit deployed to a higher-level environment then the target table in the target environment will be dropped.

  • Drop Table
  • Drop Procedure

Code

InferSchema

CopyInto

CopyInto-Snowpipe

External Tables

Copy Unload

API

JDBC

Parse Excel

Parse Json

Versions

Version # Release Date Notes
1.1.0 July 05, 2024
Six new node types added with minor changes to existing node types
1.0.0 May 31, 2024
Initial Version

Support

If you need help, please see our support section or contact us.

Listing updated: 2024-07-13 21:06:55