Warehouse Importer
You can generate and manage your API token directly from the application in the API Settings menu.
Warehouses share common structures. We have defined a format so you can load your metadata into Catalog. Fill in the 7 files below and push them to our endpoint using the Catalog Uploader.
All 7 files are mandatory and data must make sense. If you add a column in the column file but the table that contains it is not in the table file, it will fail to load into Catalog.
Always prefix file names with a Unix timestamp.
CSV Formattingβ
If you build these files in Excel or Google Sheets, save as CSV (Comma delimited), not CSV UTF-8. UTF-8 exports can include a byte-order mark that complicates ingestion.
Here's an example of a very simple CSV file:
List Field Valuesβ
Some fields such as tags are typed as list[string]. In that case, several formats are accepted:
- list "['a', 'b']"
- tuples "('a', 'b')"
- sets "{'a', 'b', 'c'}"
Empty list allowed: []
Singleton allowed: 'a'
Multiple types allowed: "['foo', 100, 19.8]"
Fields containing commas must be quoted. See Quoting below.
Forbidden Charactersβ
- Column separator is the comma
, - Row separator is the carriage return
Quotingβ
Most string fields such as table names and column names should not contain commas or carriage returns. Generally the problem comes with large text fields, such as SQL queries or descriptions.
If you have any doubts, you can quote all your text fields:
Filesβ
π Primary Key (must be unique)
π Foreign Key (must reference an existing entry)
βOptional (empty string in the CSV)
1. Databaseβ
database.csv
Database Fieldsβ
id string π
database_name string
2. Schemaβ
schema (3).csv
Schema Fieldsβ
id string π
database_id string β database.id π
schema_name string
description string β
tags list[string] β
3. Tableβ
table (5).csv
Table Fieldsβ
id string π
schema_id string β schema.id π
table_name string
description string β
tags list[string] β
type enum {TABLE | VIEW | EXTERNAL |Β TOPIC}
owner_external_id string β user.id β
4. Columnβ
column (1).csv
Column Fieldsβ
id string π
table_id string β table.id π
column_name string
description string β
data_type enum: { BOOLEAN | INTEGER | FLOAT | STRING | ... | CUSTOM }
ordinal_position positive integer β
5. Queryβ
query (7).csv
Upload the Query file even when it has no rows. The file itself is required.
We only ingest queries that ran the day before metadata ingestion. Include only those queries in the file; others are ignored.
Query Fieldsβ
query_id string β query.id
database_id string β database.id π
database_name string β database.name
schema_name string β schema.name
query_text string
user_id string β user.id π
user_name string β user name
start_time timestamp
end_time timestamp β
6. View DDLβ
view_ddl (7).csv
Upload the View DDL file even when it has no rows. The file itself is required.
View DDL Fieldsβ
database_name string
schema_name string
view_name string
view_definition string
7. Userβ
user.csv
Upload the User file even when it has no rows. The file itself is required.
User Fieldsβ
id string π
email string β
first_name string β
last_name string β
Lineageβ
We compute lineage for your integration by analyzing and parsing the Queries and View DDL when possible.
Alternatively, you can complete the following lineage mapping for Tables and/or Columns and we will ingest them during each update.
1. Table Lineageβ
external_table_lineage.csv
Table Lineage Fieldsβ
parent_path string π: path of the parent table
child_path string π: path of the child table
2. Column Lineageβ
external_column_lineage.csv
Column Lineage Fieldsβ
parent_path string π: path of the parent column
child_path string π: path of the child column