External Data Package

‹ View all packages

Package ID: @coalesce/databricks/external-data-package

Supported Platform:

Latest Version: 1.0.0 (September 11, 2025)

copy-into

Overview

Node types for data loading from files

Installation

Copy the Package ID: @coalesce/databricks/external-data-package
In Coalesce, open the Workspace where you wish to install the package.
Go to the Build Setting of the Workspace, tab Packages, and click the Install button on the top right of the page.
Paste the Package ID, and proceed with the installation process.

Description

External Data Package

The Coalesce Databricks External Data Package includes:

CopyInto
Code

CopyInto

CopyInto Node Configuration

The Copy-Into node type the following configurations available:

Node Properties
General Options
Source Data
File Format Options
Copy Options

CopyInto - Node Properties

There are four configs within the Node Properties group.

Property	Description
Storage Location	Storage Location where the Materialized View will be created
Node Type	Name of template used to create node objects
Description	A description of the node's purpose
Deploy Enabled	If TRUE the node will be deployed / redeployed when changes are detected If FALSE the node will not be deployed or will be dropped during redeployment

Key points to use CopyInto Node

CopyInto node can be created by just clicking on Create node from browser if we want the data from the file to be loaded into single variant column in target table.
The data can be reloaded into the table by truncating the data in the table before load using the TruncateBefore option in node config or reload parameter
The path or subfolder name inside stage where the file is located can be specified using config 'Path or subfolder'.Do not prefix or suffix '/' in path name.Example,one level 'SUBFOLDER',two levels 'SUBFOLDER/INNERFOLDER'.

Use CopyInto node with InferSchema option

Set Infer Schema toggle to true
Hit Create button to Infer Schema
To choose the file format configs,refer link
Click on Re-Sync Columns button
If all looks good, set Infer Schema button to false
Hit Create button to execute create table based on inferred schema
This is mainly a test to make sure create will work
Hit Run button to execute DML

Resync

If the above works, it should be deployable as is. Deploy will simply take the columns and execute a create table.

CopyInto - General Options

Dbx-Copy-Into

Option	Description
Create As	Select from the options to create as Table -Table
TruncateBefore	True / False toggle that determines whether or not a table is to be truncated before reloading - True: Table is truncated and Copy-Into statement is executed to reload the data into target table - False: Data is loaded directly into target table and no truncate action takes place.
InferSchema	True / False toggle that determines whether or not to infer the columns of file before loading - True: The node is created with the inferred columns - False: No infer table step is executed

Source data

Setting	Description
Path (Ex:gs://bucket/base/path )	A path can be an internal volume,external volume in databricks or an external location pointing to S3 bucket/gcp container .
File Name(Ex:a.csv,b.csv)	Specifies a files name to be loaded.Note: It is advised to add either the filename or file pattern file loads
Path or subfolder	Not mandatory.Specifies the path or subfolders inside the stage where the file is located.Ensure that '/' is not pre-fixed before or after the subfolder name
File Pattern (Ex:'.hea.[.]csv')	A regular expression pattern string, enclosed in single quotes, specifying the file names or paths to match.Note: It is advised to add either the filename or file pattern file loads

CopyInto - File Format Options

Setting	Description
File Format Values	Provides file format options for the File Type chosen
File Type	Each file type has different configurations available.File types supported-CSV,JSON,XML,ORC,AVRO,PARQUET
GENERIC(ALL FILE TYPES)	ignorecorruptFiles-Whether to ignore corrupt files. ignoremissingfiles-Boolean whether to ignore missing files.
CSV	Record delimiter-Characters that separate records in an input file Field delimiter- One or more singlebyte or multibyte characters that separate fields in an input file Parse Header(InferSchema-true)-Boolean that specifies whether to use the first row headers in the data files to determine column names. Encoding- Specifies the character set of the source data when loading data into a table Date format- String that defines the format of date values in the data files to be loaded. Timestamp format- String that defines the format of timestamp values in the data files to be loaded. Multi-line-Whether the CSV records span multiple lines.
JSON	Date format- String that defines the format of date values in the data files to be loaded. Timestamp format- String that defines the format of timestamp values in the data files to be loaded Multi-line-Whether the CSV records span multiple lines. Encoding- Specifies the character set of the source data when loading data into a table
XML	Date format- String that defines the format of date values in the data files to be loaded.
Timestamp format- String that defines the format of timestamp values in the data files to be loaded
Multi-line-Whether the CSV records span multiple lines. rowTag-The row tag of the XML files to treat as a row. In the example XML `<books> <book></book>...</books>`, the appropriate value is book. This is a required option for xml.
rootTag-Root tag of the XML files. For example, in `<books> <book><book>...</books>`, the appropriate value is books. Encoding- Specifies the character set of the source data when loading data into a table

CopyInto - Copy Options

Setting	Description
Force	- Boolean, default false. If set to true, idempotency is disabled and files are loaded regardless of whether they've been loaded before.
Mergeschema	-Boolean, default false. If set to true, the schema can be evolved according to the incoming data.

CopyInto - System Columns

The set of columns which has source data and file metadata information.

LOAD_TIMESTAMP - Current timestamp when the file gets loaded.
FILENAME - Name of the file the current row belongs to.
FILEPATH - Full path of the file in storage
FILEBLOCKSTART - Start offset of the file split being read
FILEBLOCKEND - Length of the file split being read
FILESIZE - Size of the file in bytes
FILE_LAST_MODIFIED - Last modified timestamp of the file

CopyInto Deployment

CopyInto Initial Deployment

When deployed for the first time into an environment the Copy-into node of materialization type table will execute the below stage:

Deployment Behavior	Stages Executed
Initial Deployment	Create Table

CopyInto Redeployment

Altering the CopyInto Tables

There are few column or table changes like Change in table name,Dropping existing column, Alter Column data type,Adding a new column if made in isolation or all-together will result in an ALTER statement to modify the Work Table in the target environment.

The following stages are executed:

Rename Table| Alter Column | Delete Column | Add Column | Edit table description: Alter table statement is executed to perform the alter operation.

Redeployment with no changes

If the nodes are redeployed with no changes compared to previous deployment,then no stages are executed

CopyInto Undeployment

If the CopyInto node is deleted from a Workspace, that Workspace is committed to Git and that commit deployed to a higher-level environment then the target table in the target environment will be dropped.

Drop table: Target table in Snowflake is dropped

Code

CopyInto

Versions

Available versions of the package.

Version #	Release Date	Notes
1.0.0	September 11, 2025	Copy-Into node type to load data from Databricks

Overview​

Installation​

Description​

External Data Package

CopyInto

CopyInto Node Configuration​

CopyInto - Node Properties

Key points to use CopyInto Node​

Use CopyInto node with InferSchema option​

CopyInto - General Options

Source data​

CopyInto - File Format Options

CopyInto - Copy Options

CopyInto - System Columns​

CopyInto Deployment​

CopyInto Initial Deployment​

CopyInto Redeployment​

Altering the CopyInto Tables​

Redeployment with no changes​

CopyInto Undeployment​

Code​

CopyInto​

Versions​

Overview

Installation

Description

CopyInto Node Configuration

Key points to use CopyInto Node

Use CopyInto node with InferSchema option

Source data

CopyInto - System Columns

CopyInto Deployment

CopyInto Initial Deployment

CopyInto Redeployment

Altering the CopyInto Tables

Redeployment with no changes

CopyInto Undeployment

Code

CopyInto

Versions