Cortex
Overview
Leverage the power of Snowflake Cortex functions from within Coalesce.
Installation
- In Coalesce, open the Workspace where you wish to install the package.
- Go to the Build Setting of the Workspace, tab Packages, and click the Install button on the top right of the page.
- Paste the Package ID, and proceed with the installation process.
Description
Cortex Package
- ML Forecast
- ML Anomaly Detection
- LLM Cortex Functions
- Top Insights
- Classification
- Document AI
- Cortex Search Service
- Code
ML Forecast
The Coalesce ML Forecast UDN is a versatile node that allows you to create a forecast table and insert forecasts of time series data using the Snowflake built-in class FORECAST.
Snowflake Cortex is Snowflake's intelligent, fully-managed service that enables organizations to quickly analyze data and build AI applications, all within Snowflake. This service makes Machine Learning (ML) functionality accessible to data engineers to enrich data pipelines while still using SQL. Forecasting employs a machine learning algorithm to predict future data by using historical time series data.
Node Configuration
The ML Forecast has two configuration groups:
Node Properties
Property | Description |
---|---|
Storage Location | Storage Location where the Forecast table will be created |
Node Type | Name of template used to create node objects |
Description | A description of the node's purpose |
Deploy Enabled | If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment |
Forecast Model Input
Option | Description |
---|---|
Model Instance Name | (Required) Name of the model that needs to be created |
Create Model | True/False toggle to determine model creation: - True: Forcefully create Forecast model -- Series Column (required for multi-series): For multiple time series models, the name of the column defining the multiple time series in input data. - False: Refer to existing Forecast model |
Multi-Series Forecast | True/False toggle for forecast type: - True: Create multi-series forecast model with series column, timestamp column and target column - False: Specify the timestamp column and target column to create single-series forecast model |
Series Column | (Required for multi-series) Column defining multiple time series in input data |
Timestamp Column | (Required) Column containing timestamps in input data |
Target Column | (Required) Column containing target values in input data |
Config object | OBJECT containing key-value pairs to configure forecast job |
Series value | Required for multi-series forecasts. Single value or VARIANT |
Exogenous Variables | True/False toggle: - True: Add future-valued exogenous data using multi-source toggle - False: Create forecast model based on days to forecast only |
Multi Source | Toggle to add future-valued Exogenous data |
Days to Forecast | (Required for forecasts without exogenous variables) Number of steps ahead to forecast |
ML Forecast Preprocessing of Data
When the forecast model returns an error, the error message returned by Snowflake is captured and surfaced directly in the Coalesce application for troubleshooting.
Common scenarios you may encounter:
- NULLS in the source data. The model will cope with some, but not too many NULLS.
- Missing time periods. If the model is unable to determine a consistent frequency in the time series it will cause an error.
- Missing exogenous variables. If the model was trained with exogenous variables.
- Exogenous variables need to be provided into the future to predict future values.
A data preparation step in Coalesce can be used prior to the ML Forecast node to address these issues.
ML Forecast Deployment
ML Forecast Initial Deployment
When deployed for the first time into an environment the ML Forecast node will execute:
Stage | Description |
---|---|
Create Forecast Table | This will execute a CREATE OR REPLACE statement and create a Forecast Table in the target environment |
ML Forecast Redeployment
After the ML Forecast node has been deployed for the first time into a target environment, subsequent deployments may result in altering the forecast table.
ML Forecast Altering the Forecast Tables
The following column or table changes that is made in isolation or all-together will result in an ALTER statement to modify the Forecast table in the target environment:
- Change in table name
- Dropping existing column
- Alter column data type
- Adding a new column
The following stages are executed:
Stage | Description |
---|---|
Clone Table | Creates an internal table |
Rename Table/Alter Column/Delete Column/Add Column/Edit table description | Alter table statement is executed to perform the alter operation accordingly |
Swap cloned Table | Upon successful completion of all updates, the clone replaces the main table ensuring that no data is lost |
Delete Table | Drops the internal table |
ML Forecast Undeployment
If a ML Forecast table is deleted from a Workspace, that Workspace is committed to Git and that commit deployed to a higher level environment then the Forecast Table in the target environment will be dropped.
This is executed in two stages:
Stage | Description |
---|---|
Delete Table | Coalesce Internal table is dropped |
Delete Table | Target table in Snowflake is dropped |
ML Anomaly Detection
The Coalesce ML Anomaly Detection UDN is a versatile node that allows you to create an Anomaly table and insert anomalies of time series data using the Snowflake built-in class ANOMALY DETECTION.
Snowflake Cortex is Snowflake's intelligent, fully-managed service that enables organizations to quickly analyze data and build AI applications, all within Snowflake. This service makes Machine Learning (ML) functionality accessible to data engineers to enrich data pipelines while still using SQL. Anomalies in data are detected by analyzing the dataset using a machine learning algorithm.
ML Anomaly Detection Node Configuration
The ML Anomaly has two configuration groups:
ML Anomaly Node Properties
Property | Description |
---|---|
Storage Location | Storage Location where the Table will be created |
Node Type | Name of template used to create node objects |
Description | A description of the node's purpose |
Deploy Enabled | If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment |