Castor Extractor Reference
Use this page for per-command flags and environment variables. For installation, workflow, and troubleshooting, see Castor Extractor.
Global Variables
These variables apply across all commands:
| Variable | Purpose |
|---|---|
CASTOR_OUTPUT_DIRECTORY | Default output directory for all extractors. |
GOOGLE_APPLICATION_CREDENTIALS | Default GCP credentials file for BigQuery and Looker Studio. |
Zone Selection
- Use
USif your instance is onapp.us.castordoc.com. - Use
EUif your instance is onapp.castordoc.com.
Upload and Validate
castor-file-check
Validate generic warehouse CSV files before upload.
-d, --directory: directory containing generic warehouse CSV files--verbose: show detailed validation logs
castor-upload
Push extracted files to Catalog-managed GCS.
-k, --token: API token from Catalog-s, --source_id: source ID from Catalog-t, --file_type: file type (WAREHOUSE,VIZ,DBT,QUALITY)WAREHOUSEextractorsVIZVisualization extractors- Knowledge bases (Confluence and Notion) use
VIZ QUALITY- Used for external data quality tools along with generic CSV files.
-z, --zone: upload zone (USorEU, defaultEU)-f, --file_path: upload one file-d, --directory_path: upload all files in a directory
You can only use --file_path or --directory_path, not both.
Use --help to get the most up-to-date flags. For example, castor-extract-sqlserver --help.
Warehouse Extractors
These use upload file type WAREHOUSE
castor-extract-bigquery
| Flag | Description |
|---|---|
-c, --credentials | Path to Google credentials file. |
-o, --output | Output directory. |
--skip-existing | Keep previously extracted files. |
--db-allowed <list> | Allowed GCP projects. |
--db-blocked <list> | Blocked GCP projects. |
-s, --safe-mode | Safe mode. |
These environment variables are supported:
GOOGLE_APPLICATION_CREDENTIALS
castor-extract-databricks
| Flag | Description |
|---|---|
-H, --host | Databricks host. |
-t, --token | Access token. |
-p, --http-path | HTTP path. |
-o, --output | Output directory. |
--catalog-allowed <list> | Allowed catalogs. |
--catalog-blocked <list> | Blocked catalogs. |
--skip-existing | Keep previously extracted files. |
These environment variables are supported:
CASTOR_DATABRICKS_HOSTCASTOR_DATABRICKS_HTTP_PATHCASTOR_DATABRICKS_TOKEN
castor-extract-glue-athena
Requires pip install castor-extractor[glue-athena].
| Flag | Description |
|---|---|
--access-key-id | AWS access key ID. |
--access-key-secret | AWS access key secret. |
--aws-region | AWS region. |
--aws-account-id | AWS account ID. |
--schema-allowed <list> | Glue schemas to include. |
--schema-blocked <list> | Glue schemas to exclude. |
-s, --skip-queries | Skip SQL query and view DDL extraction. |
-o, --output | Output directory. |
These environment variables are supported:
CASTOR_GLUE_ACCESS_KEY_IDCASTOR_GLUE_ACCESS_KEY_SECRETCASTOR_GLUE_AWS_REGIONCASTOR_GLUE_AWS_ACCOUNT_ID
castor-extract-mysql
| Flag | Description |
|---|---|
-H, --host | MySQL host. |
-P, --port | MySQL port. |
-u, --user | MySQL user. |
-p, --password | MySQL password. |
-o, --output | Output directory. |
--skip-existing | Keep previously extracted files. |
These environment variables are supported:
CASTOR_MYSQL_USERCASTOR_MYSQL_PASSWORDCASTOR_MYSQL_HOSTCASTOR_MYSQL_PORT(optional)
castor-extract-postgres
| Flag | Description |
|---|---|
-H, --host | Postgres host. |
-P, --port | Postgres port. |
-d, --database | Postgres database. |
-u, --user | Postgres user. |
-p, --password | Postgres password. |
-o, --output | Output directory. |
--skip-existing | Keep previously extracted files. |
These environment variables are supported:
CASTOR_POSTGRES_USERCASTOR_POSTGRES_PASSWORDCASTOR_POSTGRES_HOSTCASTOR_POSTGRES_PORTCASTOR_POSTGRES_DATABASE
castor-extract-redshift
| Flag | Description |
|---|---|
-H, --host | Redshift host. |
-P, --port | Redshift port. |
-d, --database | Redshift database. |
-u, --user | Redshift user. |
-p, --password | Redshift password. |
-o, --output | Output directory. |
--skip-existing | Keep previously extracted files. |
--serverless | Extract from Redshift Serverless. |
These environment variables are supported:
CASTOR_REDSHIFT_USERCASTOR_REDSHIFT_PASSWORDCASTOR_REDSHIFT_HOSTCASTOR_REDSHIFT_PORTCASTOR_REDSHIFT_DATABASECASTOR_REDSHIFT_SERVERLESS(optional;true/false)
castor-extract-snowflake
| Flag | Description |
|---|---|
-a, --account | Snowflake account. |
-u, --user | Snowflake user. |
-p, --password | Password. Mutually exclusive with --private-key. |
-pk, --private-key | Private key. Mutually exclusive with --password. |
--warehouse | Warehouse override. |
--role | Role override. |
--db-allowed <list> | Allowed databases. |
--db-blocked <list> | Blocked databases. |
--query-blocked <list> | Blocked query patterns. Supports % and _ wildcards. |
--fetch-transient | Include transient tables. |
--insecure-mode | Disable OCSP checking. |
-o, --output | Output directory. |
--skip-existing | Keep previously extracted files. |
These environment variables are supported:
CASTOR_SNOWFLAKE_ACCOUNTCASTOR_SNOWFLAKE_USERCASTOR_SNOWFLAKE_PASSWORD(optional if using private key)CASTOR_SNOWFLAKE_PRIVATE_KEY(optional if using password)CASTOR_SNOWFLAKE_INSECURE_MODE(optional)
castor-extract-sqlserver
| Flag | Description |
|---|---|
-H, --host | MSSQL host. |
-P, --port | MSSQL port. |
-u, --user | MSSQL user. |
-p, --password | MSSQL password. |
-s, --skip-queries | Skip SQL query extraction. |
--db-allowed <list> | Allowed databases. |
--db-blocked <list> | Blocked databases. |
--default-db | Fallback database for login issues. |
-o, --output | Output directory. |
--skip-existing | Keep previously extracted files. |
These environment variables are supported:
CASTOR_MSSQL_USERCASTOR_MSSQL_PASSWORDCASTOR_MSSQL_HOSTCASTOR_MSSQL_PORTCASTOR_MSSQL_DEFAULT_DB(optional)
Visualization Extractors
These use file type VIZ.
castor-extract-count
| Flag | Description |
|---|---|
-c, --credentials | GCP credentials as string. |
-d, --dataset_id | Data set ID storing Count data. |
-o, --output | Output directory. |
castor-extract-domo
| Flag | Description |
|---|---|
-b, --base-url | Domo host. |
-a, --api-token | API token. |
-d, --developer-token | Developer token. |
-c, --client-id | Client ID. |
-C, --cloud-id | External warehouse ID. |
-o, --output | Output directory. |
These environment variables are supported:
CASTOR_DOMO_API_TOKENCASTOR_DOMO_BASE_URLCASTOR_DOMO_CLIENT_IDCASTOR_DOMO_DEVELOPER_TOKENCASTOR_DOMO_CLOUD_IDCLOUD_ID
castor-extract-looker
| Flag | Description |
|---|---|
-b, --base-url | Looker base URL. |
-c, --client-id | Client ID. |
-s, --client-secret | Client secret. |
-t, --timeout | Timeout in seconds. |
--thread-pool-size | Thread pool size. |
-S, --safe-mode | Safe mode. |
--log-to-stdout | Log to stdout. |
--search-per-folder | Fetch looks and dashboards per folder. |
-o, --output | Output directory. |
These environment variables are supported:
CASTOR_LOOKER_BASE_URLCASTOR_LOOKER_CLIENT_IDCASTOR_LOOKER_CLIENT_SECRETCASTOR_LOOKER_TIMEOUT_SECOND(optional override)CASTOR_LOOKER_PAGE_SIZE(optional override)CASTOR_LOOKER_THREAD_POOL_SIZE(optional override)CASTOR_LOOKER_IS_SAFE_MODE(optional;true/false)CASTOR_LOOKER_LOG_TO_STDOUT(optional;true/false)CASTOR_LOOKER_SEARCH_PER_FOLDER(optional;true/false)
castor-extract-looker-studio
| Flag | Description |
|---|---|
-o, --output | Output directory. |
--source-queries-only | Only extract BigQuery source queries. |
--skip-view-activity-logs | Skip activity log extraction. |
-c, --credentials | Service account credentials file. |
-a, --admin-email | Google Workspace admin email. |
--users-file-path | Path to JSON array of user emails. |
-b, --bigquery-credentials | BigQuery service account credentials file. |
--db-allowed <list> | Allowed GCP projects for source queries. |
--db-blocked <list> | Blocked GCP projects for source queries. |
These environment variables are supported:
GOOGLE_APPLICATION_CREDENTIALS: path to the Looker Studio service account JSON file (when using-c/--credentials)CASTOR_LOOKER_STUDIO_ADMIN_EMAIL: Google Workspace admin email (when using-a/--admin-email)CASTOR_OUTPUT_DIRECTORY: default output directory
castor-extract-metabase-api
| Flag | Description |
|---|---|
-b, --base-url | Metabase base URL. |
-u, --user | Username. |
-p, --password | Password. |
-o, --output | Output directory. |
These environment variables are supported:
CASTOR_METABASE_API_BASE_URLCASTOR_METABASE_API_USERNAMECASTOR_METABASE_API_USERCASTOR_METABASE_API_PASSWORD
castor-extract-metabase-db
| Flag | Description |
|---|---|
-H, --host | Host. |
-P, --port | Port. |
-d, --database | Database. |
-s, --schema | Schema. |
-u, --user | Username. |
-p, --password | Password. |
-k, --encryption_secret_key | Encryption key. |
--require_ssl | Require SSL. |
-o, --output | Output directory. |
These environment variables are supported:
CASTOR_METABASE_DB_HOSTCASTOR_METABASE_DB_PORTCASTOR_METABASE_DB_DATABASECASTOR_METABASE_DB_SCHEMACASTOR_METABASE_DB_USERNAMECASTOR_METABASE_DB_PASSWORDCASTOR_METABASE_DB_ENCRYPTION_SECRET_KEY(optional)CASTOR_METABASE_DB_REQUIRE_SSL_KEY(optional)
castor-extract-mode
| Flag | Description |
|---|---|
-H, --host | Mode host. |
-w, --workspace | Workspace. |
-t, --token | API token. |
-s, --secret | API token password. |
-o, --output | Output directory. |
These environment variables are supported:
CASTOR_MODE_ANALYTICS_HOSTCASTOR_MODE_ANALYTICS_SECRETCASTOR_MODE_ANALYTICS_TOKENCASTOR_MODE_ANALYTICS_WORKSPACE
castor-extract-omni
Requires pip install castor-extractor[omni].
| Flag | Description |
|---|---|
-u, --base-url | Omni instance root URL (the hostname you use to sign in). |
-t, --token | Omni organization API key (Bearer token). |
-o, --output | Output directory. |
These environment variables are supported:
CASTOR_OMNI_BASE_URLCASTOR_OMNI_TOKEN
castor-extract-powerbi
| Flag | Description |
|---|---|
-t, --tenant_id | Tenant ID. |
-c, --client_id | Client ID. |
-s, --secret | Client secret. Mutually exclusive with --certificate. |
-cert, --certificate | Certificate file. Mutually exclusive with --secret. |
-sc, --scopes <list> | API scopes. Optional. |
-l, --login_url | Login URL. Optional. |
-a, --api_base | Power BI REST API base. Optional. |
-g, --graph_api_base | Microsoft Graph API base. Optional. |
-o, --output | Output directory. |
These environment variables are supported:
CASTOR_POWERBI_CLIENT_IDCASTOR_POWERBI_TENANT_IDCASTOR_POWERBI_SECRET(optional if using certificate)CASTOR_POWERBI_CERTIFICATE(optional if using secret)CASTOR_POWERBI_API_BASE(optional)CASTOR_POWERBI_GRAPH_API_BASE(optional)CASTOR_POWERBI_LOGIN_URL(optional)CASTOR_POWERBI_SCOPES(optional)
castor-extract-qlik
| Flag | Description |
|---|---|
-b, --base-url | Qlik base URL. |
-a, --api-key | API key. |
-e, --except-http-error-statuses <list> | HTTP status codes to ignore as warnings. |
-s, --include-sheets | Include sheets extraction. |
-o, --output | Output directory. |
These environment variables are supported:
CASTOR_QLIK_API_KEYCASTOR_QLIK_BASE_URL
castor-extract-salesforce
| Flag | Description |
|---|---|
-u, --username | Salesforce username. |
-p, --password | Password. |
-c, --client-id | Client ID. |
-s, --client-secret | Client secret. |
-t, --security-token | Security token. |
-b, --base-url | Instance URL. |
-o, --output | Output directory. |
--skip-existing | Keep previously extracted files. |
These environment variables are supported:
CASTOR_SALESFORCE_BASE_URLCASTOR_SALESFORCE_CLIENT_IDCASTOR_SALESFORCE_CLIENT_SECRETCASTOR_SALESFORCE_PASSWORDCASTOR_SALESFORCE_SECURITY_TOKENCASTOR_SALESFORCE_USERNAME
castor-extract-salesforce-viz
| Flag | Description |
|---|---|
-u, --username | Salesforce username. |
-p, --password | Password. |
-c, --client-id | Client ID. |
-s, --client-secret | Client secret. |
-t, --security-token | Security token. |
-b, --base-url | Instance URL. |
-o, --output | Output directory. |
These environment variables are supported:
CASTOR_SALESFORCE_BASE_URLCASTOR_SALESFORCE_CLIENT_IDCASTOR_SALESFORCE_CLIENT_SECRETCASTOR_SALESFORCE_PASSWORDCASTOR_SALESFORCE_SECURITY_TOKENCASTOR_SALESFORCE_USERNAME
castor-extract-sigma
| Flag | Description |
|---|---|
-H, --host | Sigma host. |
-c, --client-id | Client ID. |
-a, --api-token | API key. |
-o, --output | Output directory. |
These environment variables are supported:
CASTOR_SIGMA_API_TOKENCASTOR_SIGMA_CLIENT_IDCASTOR_SIGMA_HOSTCASTOR_SIGMA_GRANT_TYPE(optional)
castor-extract-strategy
| Flag | Description |
|---|---|
-u, --username | Username. |
-p, --password | Password. |
-b, --base-url | Strategy URL. |
-i, --project-ids <list> | Project IDs. Optional. |
-o, --output | Output directory. |
These environment variables are supported:
CATALOG_STRATEGY_BASE_URLCATALOG_STRATEGY_PASSWORDCATALOG_STRATEGY_USERNAMECATALOG_STRATEGY_PROJECT_IDS(optional; comma-separated supported)
castor-extract-tableau
| Flag | Description |
|---|---|
-u, --user | Tableau user. |
-n, --token-name | Token name. |
-p, --password | Password. |
-t, --token | Token. |
-b, --server-url | Server URL. |
-i, --site-id | Site ID. |
--skip-columns | Skip column extraction. |
--skip-fields | Skip field extraction. |
--with-pulse | Extract Pulse assets. |
--page-size | Custom pagination size. |
--ignore-ssl | Disable SSL verification. |
-o, --output | Output directory. |
These environment variables are supported:
CASTOR_TABLEAU_SERVER_URLCASTOR_TABLEAU_SITE_IDCASTOR_TABLEAU_USER(required for username/password auth)CASTOR_TABLEAU_PASSWORD(required for username/password auth)CASTOR_TABLEAU_TOKEN_NAME(required for PAT auth)CASTOR_TABLEAU_TOKEN(required for PAT auth)
castor-extract-thoughtspot
| Flag | Description |
|---|---|
-b, --base_url | Base URL. |
-u, --username | Username. |
-p, --password | Password. |
-o, --output | Output directory. |
These environment variables are supported:
CASTOR_THOUGHTSPOT_BASE_URLCASTOR_THOUGHTSPOT_USERNAMECASTOR_THOUGHTSPOT_PASSWORD
castor-extract-confluence
| Flag | Description |
|---|---|
-a, --account_id | Confluence account ID. |
-b, --base_url | Confluence base URL. |
-t, --token | API token. |
-u, --username | Username. |
--include-archived-spaces | Include archived spaces. |
--include-personal-spaces | Include personal spaces. |
--space-ids-allowed <list> | Only include these space IDs. |
--space-ids-blocked <list> | Exclude these space IDs. |
-o, --output | Output directory. |
These environment variables are supported:
CASTOR_CONFLUENCE_ACCOUNT_IDCASTOR_CONFLUENCE_BASE_URLCASTOR_CONFLUENCE_TOKENCASTOR_CONFLUENCE_USERNAME
castor-extract-notion
| Flag | Description |
|---|---|
-t, --token | Notion token. |
-o, --output | Output directory. |
These environment variables are supported:
CASTOR_NOTION_TOKEN