Skip to main content

Castor Extractor Reference

Use this page for per-command flags and environment variables. For installation, workflow, and troubleshooting, see Castor Extractor.

Global Variables

These variables apply across all commands:

VariablePurpose
CASTOR_OUTPUT_DIRECTORYDefault output directory for all extractors.
GOOGLE_APPLICATION_CREDENTIALSDefault GCP credentials file for BigQuery and Looker Studio.

Zone Selection

  • Use US if your instance is on app.us.castordoc.com.
  • Use EU if your instance is on app.castordoc.com.

Upload and Validate

castor-file-check

Validate generic warehouse CSV files before upload.

  • -d, --directory: directory containing generic warehouse CSV files
  • --verbose: show detailed validation logs

castor-upload

Push extracted files to Catalog-managed GCS.

  • -k, --token: API token from Catalog
  • -s, --source_id: source ID from Catalog
  • -t, --file_type: file type (WAREHOUSE, VIZ, DBT, QUALITY)
    • WAREHOUSE extractors
    • VIZ Visualization extractors
    • Knowledge bases (Confluence and Notion) use VIZ
    • QUALITY - Used for external data quality tools along with generic CSV files.
  • -z, --zone: upload zone (US or EU, default EU)
  • -f, --file_path: upload one file
  • -d, --directory_path: upload all files in a directory

You can only use --file_path or --directory_path, not both.

CLI help

Use --help to get the most up-to-date flags. For example, castor-extract-sqlserver --help.

Warehouse Extractors

These use upload file type WAREHOUSE

castor-extract-bigquery

FlagDescription
-c, --credentialsPath to Google credentials file.
-o, --outputOutput directory.
--skip-existingKeep previously extracted files.
--db-allowed <list>Allowed GCP projects.
--db-blocked <list>Blocked GCP projects.
-s, --safe-modeSafe mode.

These environment variables are supported:

  • GOOGLE_APPLICATION_CREDENTIALS

castor-extract-databricks

FlagDescription
-H, --hostDatabricks host.
-t, --tokenAccess token.
-p, --http-pathHTTP path.
-o, --outputOutput directory.
--catalog-allowed <list>Allowed catalogs.
--catalog-blocked <list>Blocked catalogs.
--skip-existingKeep previously extracted files.

These environment variables are supported:

  • CASTOR_DATABRICKS_HOST
  • CASTOR_DATABRICKS_HTTP_PATH
  • CASTOR_DATABRICKS_TOKEN

castor-extract-glue-athena

Requires pip install castor-extractor[glue-athena].

FlagDescription
--access-key-idAWS access key ID.
--access-key-secretAWS access key secret.
--aws-regionAWS region.
--aws-account-idAWS account ID.
--schema-allowed <list>Glue schemas to include.
--schema-blocked <list>Glue schemas to exclude.
-s, --skip-queriesSkip SQL query and view DDL extraction.
-o, --outputOutput directory.

These environment variables are supported:

  • CASTOR_GLUE_ACCESS_KEY_ID
  • CASTOR_GLUE_ACCESS_KEY_SECRET
  • CASTOR_GLUE_AWS_REGION
  • CASTOR_GLUE_AWS_ACCOUNT_ID

castor-extract-mysql

FlagDescription
-H, --hostMySQL host.
-P, --portMySQL port.
-u, --userMySQL user.
-p, --passwordMySQL password.
-o, --outputOutput directory.
--skip-existingKeep previously extracted files.

These environment variables are supported:

  • CASTOR_MYSQL_USER
  • CASTOR_MYSQL_PASSWORD
  • CASTOR_MYSQL_HOST
  • CASTOR_MYSQL_PORT (optional)

castor-extract-postgres

FlagDescription
-H, --hostPostgres host.
-P, --portPostgres port.
-d, --databasePostgres database.
-u, --userPostgres user.
-p, --passwordPostgres password.
-o, --outputOutput directory.
--skip-existingKeep previously extracted files.

These environment variables are supported:

  • CASTOR_POSTGRES_USER
  • CASTOR_POSTGRES_PASSWORD
  • CASTOR_POSTGRES_HOST
  • CASTOR_POSTGRES_PORT
  • CASTOR_POSTGRES_DATABASE

castor-extract-redshift

FlagDescription
-H, --hostRedshift host.
-P, --portRedshift port.
-d, --databaseRedshift database.
-u, --userRedshift user.
-p, --passwordRedshift password.
-o, --outputOutput directory.
--skip-existingKeep previously extracted files.
--serverlessExtract from Redshift Serverless.

These environment variables are supported:

  • CASTOR_REDSHIFT_USER
  • CASTOR_REDSHIFT_PASSWORD
  • CASTOR_REDSHIFT_HOST
  • CASTOR_REDSHIFT_PORT
  • CASTOR_REDSHIFT_DATABASE
  • CASTOR_REDSHIFT_SERVERLESS (optional; true/false)

castor-extract-snowflake

FlagDescription
-a, --accountSnowflake account.
-u, --userSnowflake user.
-p, --passwordPassword. Mutually exclusive with --private-key.
-pk, --private-keyPrivate key. Mutually exclusive with --password.
--warehouseWarehouse override.
--roleRole override.
--db-allowed <list>Allowed databases.
--db-blocked <list>Blocked databases.
--query-blocked <list>Blocked query patterns. Supports % and _ wildcards.
--fetch-transientInclude transient tables.
--insecure-modeDisable OCSP checking.
-o, --outputOutput directory.
--skip-existingKeep previously extracted files.

These environment variables are supported:

  • CASTOR_SNOWFLAKE_ACCOUNT
  • CASTOR_SNOWFLAKE_USER
  • CASTOR_SNOWFLAKE_PASSWORD (optional if using private key)
  • CASTOR_SNOWFLAKE_PRIVATE_KEY (optional if using password)
  • CASTOR_SNOWFLAKE_INSECURE_MODE (optional)

castor-extract-sqlserver

FlagDescription
-H, --hostMSSQL host.
-P, --portMSSQL port.
-u, --userMSSQL user.
-p, --passwordMSSQL password.
-s, --skip-queriesSkip SQL query extraction.
--db-allowed <list>Allowed databases.
--db-blocked <list>Blocked databases.
--default-dbFallback database for login issues.
-o, --outputOutput directory.
--skip-existingKeep previously extracted files.

These environment variables are supported:

  • CASTOR_MSSQL_USER
  • CASTOR_MSSQL_PASSWORD
  • CASTOR_MSSQL_HOST
  • CASTOR_MSSQL_PORT
  • CASTOR_MSSQL_DEFAULT_DB (optional)

Visualization Extractors

These use file type VIZ.

castor-extract-count

FlagDescription
-c, --credentialsGCP credentials as string.
-d, --dataset_idData set ID storing Count data.
-o, --outputOutput directory.

castor-extract-domo

FlagDescription
-b, --base-urlDomo host.
-a, --api-tokenAPI token.
-d, --developer-tokenDeveloper token.
-c, --client-idClient ID.
-C, --cloud-idExternal warehouse ID.
-o, --outputOutput directory.

These environment variables are supported:

  • CASTOR_DOMO_API_TOKEN
  • CASTOR_DOMO_BASE_URL
  • CASTOR_DOMO_CLIENT_ID
  • CASTOR_DOMO_DEVELOPER_TOKEN
  • CASTOR_DOMO_CLOUD_ID
  • CLOUD_ID

castor-extract-looker

FlagDescription
-b, --base-urlLooker base URL.
-c, --client-idClient ID.
-s, --client-secretClient secret.
-t, --timeoutTimeout in seconds.
--thread-pool-sizeThread pool size.
-S, --safe-modeSafe mode.
--log-to-stdoutLog to stdout.
--search-per-folderFetch looks and dashboards per folder.
-o, --outputOutput directory.

These environment variables are supported:

  • CASTOR_LOOKER_BASE_URL
  • CASTOR_LOOKER_CLIENT_ID
  • CASTOR_LOOKER_CLIENT_SECRET
  • CASTOR_LOOKER_TIMEOUT_SECOND (optional override)
  • CASTOR_LOOKER_PAGE_SIZE (optional override)
  • CASTOR_LOOKER_THREAD_POOL_SIZE (optional override)
  • CASTOR_LOOKER_IS_SAFE_MODE (optional; true/false)
  • CASTOR_LOOKER_LOG_TO_STDOUT (optional; true/false)
  • CASTOR_LOOKER_SEARCH_PER_FOLDER (optional; true/false)

castor-extract-looker-studio

FlagDescription
-o, --outputOutput directory.
--source-queries-onlyOnly extract BigQuery source queries.
--skip-view-activity-logsSkip activity log extraction.
-c, --credentialsService account credentials file.
-a, --admin-emailGoogle Workspace admin email.
--users-file-pathPath to JSON array of user emails.
-b, --bigquery-credentialsBigQuery service account credentials file.
--db-allowed <list>Allowed GCP projects for source queries.
--db-blocked <list>Blocked GCP projects for source queries.

These environment variables are supported:

  • GOOGLE_APPLICATION_CREDENTIALS: path to the Looker Studio service account JSON file (when using -c / --credentials)
  • CASTOR_LOOKER_STUDIO_ADMIN_EMAIL: Google Workspace admin email (when using -a / --admin-email)
  • CASTOR_OUTPUT_DIRECTORY: default output directory

castor-extract-metabase-api

FlagDescription
-b, --base-urlMetabase base URL.
-u, --userUsername.
-p, --passwordPassword.
-o, --outputOutput directory.

These environment variables are supported:

  • CASTOR_METABASE_API_BASE_URL
  • CASTOR_METABASE_API_USERNAME
  • CASTOR_METABASE_API_USER
  • CASTOR_METABASE_API_PASSWORD

castor-extract-metabase-db

FlagDescription
-H, --hostHost.
-P, --portPort.
-d, --databaseDatabase.
-s, --schemaSchema.
-u, --userUsername.
-p, --passwordPassword.
-k, --encryption_secret_keyEncryption key.
--require_sslRequire SSL.
-o, --outputOutput directory.

These environment variables are supported:

  • CASTOR_METABASE_DB_HOST
  • CASTOR_METABASE_DB_PORT
  • CASTOR_METABASE_DB_DATABASE
  • CASTOR_METABASE_DB_SCHEMA
  • CASTOR_METABASE_DB_USERNAME
  • CASTOR_METABASE_DB_PASSWORD
  • CASTOR_METABASE_DB_ENCRYPTION_SECRET_KEY (optional)
  • CASTOR_METABASE_DB_REQUIRE_SSL_KEY (optional)

castor-extract-mode

FlagDescription
-H, --hostMode host.
-w, --workspaceWorkspace.
-t, --tokenAPI token.
-s, --secretAPI token password.
-o, --outputOutput directory.

These environment variables are supported:

  • CASTOR_MODE_ANALYTICS_HOST
  • CASTOR_MODE_ANALYTICS_SECRET
  • CASTOR_MODE_ANALYTICS_TOKEN
  • CASTOR_MODE_ANALYTICS_WORKSPACE

castor-extract-omni

Requires pip install castor-extractor[omni].

FlagDescription
-u, --base-urlOmni instance root URL (the hostname you use to sign in).
-t, --tokenOmni organization API key (Bearer token).
-o, --outputOutput directory.

These environment variables are supported:

  • CASTOR_OMNI_BASE_URL
  • CASTOR_OMNI_TOKEN

castor-extract-powerbi

FlagDescription
-t, --tenant_idTenant ID.
-c, --client_idClient ID.
-s, --secretClient secret. Mutually exclusive with --certificate.
-cert, --certificateCertificate file. Mutually exclusive with --secret.
-sc, --scopes <list>API scopes. Optional.
-l, --login_urlLogin URL. Optional.
-a, --api_basePower BI REST API base. Optional.
-g, --graph_api_baseMicrosoft Graph API base. Optional.
-o, --outputOutput directory.

These environment variables are supported:

  • CASTOR_POWERBI_CLIENT_ID
  • CASTOR_POWERBI_TENANT_ID
  • CASTOR_POWERBI_SECRET (optional if using certificate)
  • CASTOR_POWERBI_CERTIFICATE (optional if using secret)
  • CASTOR_POWERBI_API_BASE (optional)
  • CASTOR_POWERBI_GRAPH_API_BASE (optional)
  • CASTOR_POWERBI_LOGIN_URL (optional)
  • CASTOR_POWERBI_SCOPES (optional)

castor-extract-qlik

FlagDescription
-b, --base-urlQlik base URL.
-a, --api-keyAPI key.
-e, --except-http-error-statuses <list>HTTP status codes to ignore as warnings.
-s, --include-sheetsInclude sheets extraction.
-o, --outputOutput directory.

These environment variables are supported:

  • CASTOR_QLIK_API_KEY
  • CASTOR_QLIK_BASE_URL

castor-extract-salesforce

FlagDescription
-u, --usernameSalesforce username.
-p, --passwordPassword.
-c, --client-idClient ID.
-s, --client-secretClient secret.
-t, --security-tokenSecurity token.
-b, --base-urlInstance URL.
-o, --outputOutput directory.
--skip-existingKeep previously extracted files.

These environment variables are supported:

  • CASTOR_SALESFORCE_BASE_URL
  • CASTOR_SALESFORCE_CLIENT_ID
  • CASTOR_SALESFORCE_CLIENT_SECRET
  • CASTOR_SALESFORCE_PASSWORD
  • CASTOR_SALESFORCE_SECURITY_TOKEN
  • CASTOR_SALESFORCE_USERNAME

castor-extract-salesforce-viz

FlagDescription
-u, --usernameSalesforce username.
-p, --passwordPassword.
-c, --client-idClient ID.
-s, --client-secretClient secret.
-t, --security-tokenSecurity token.
-b, --base-urlInstance URL.
-o, --outputOutput directory.

These environment variables are supported:

  • CASTOR_SALESFORCE_BASE_URL
  • CASTOR_SALESFORCE_CLIENT_ID
  • CASTOR_SALESFORCE_CLIENT_SECRET
  • CASTOR_SALESFORCE_PASSWORD
  • CASTOR_SALESFORCE_SECURITY_TOKEN
  • CASTOR_SALESFORCE_USERNAME

castor-extract-sigma

FlagDescription
-H, --hostSigma host.
-c, --client-idClient ID.
-a, --api-tokenAPI key.
-o, --outputOutput directory.

These environment variables are supported:

  • CASTOR_SIGMA_API_TOKEN
  • CASTOR_SIGMA_CLIENT_ID
  • CASTOR_SIGMA_HOST
  • CASTOR_SIGMA_GRANT_TYPE (optional)

castor-extract-strategy

FlagDescription
-u, --usernameUsername.
-p, --passwordPassword.
-b, --base-urlStrategy URL.
-i, --project-ids <list>Project IDs. Optional.
-o, --outputOutput directory.

These environment variables are supported:

  • CATALOG_STRATEGY_BASE_URL
  • CATALOG_STRATEGY_PASSWORD
  • CATALOG_STRATEGY_USERNAME
  • CATALOG_STRATEGY_PROJECT_IDS (optional; comma-separated supported)

castor-extract-tableau

FlagDescription
-u, --userTableau user.
-n, --token-nameToken name.
-p, --passwordPassword.
-t, --tokenToken.
-b, --server-urlServer URL.
-i, --site-idSite ID.
--skip-columnsSkip column extraction.
--skip-fieldsSkip field extraction.
--with-pulseExtract Pulse assets.
--page-sizeCustom pagination size.
--ignore-sslDisable SSL verification.
-o, --outputOutput directory.

These environment variables are supported:

  • CASTOR_TABLEAU_SERVER_URL
  • CASTOR_TABLEAU_SITE_ID
  • CASTOR_TABLEAU_USER (required for username/password auth)
  • CASTOR_TABLEAU_PASSWORD (required for username/password auth)
  • CASTOR_TABLEAU_TOKEN_NAME (required for PAT auth)
  • CASTOR_TABLEAU_TOKEN (required for PAT auth)

castor-extract-thoughtspot

FlagDescription
-b, --base_urlBase URL.
-u, --usernameUsername.
-p, --passwordPassword.
-o, --outputOutput directory.

These environment variables are supported:

  • CASTOR_THOUGHTSPOT_BASE_URL
  • CASTOR_THOUGHTSPOT_USERNAME
  • CASTOR_THOUGHTSPOT_PASSWORD

castor-extract-confluence

FlagDescription
-a, --account_idConfluence account ID.
-b, --base_urlConfluence base URL.
-t, --tokenAPI token.
-u, --usernameUsername.
--include-archived-spacesInclude archived spaces.
--include-personal-spacesInclude personal spaces.
--space-ids-allowed <list>Only include these space IDs.
--space-ids-blocked <list>Exclude these space IDs.
-o, --outputOutput directory.

These environment variables are supported:

  • CASTOR_CONFLUENCE_ACCOUNT_ID
  • CASTOR_CONFLUENCE_BASE_URL
  • CASTOR_CONFLUENCE_TOKEN
  • CASTOR_CONFLUENCE_USERNAME

castor-extract-notion

FlagDescription
-t, --tokenNotion token.
-o, --outputOutput directory.

These environment variables are supported:

  • CASTOR_NOTION_TOKEN