Skip to content

Schema

Table Schema commands help you work with table schemas - metadata that describes the structure, types, and constraints of tabular data. These commands allow you to infer schema from data, validate schema definitions, and explore schema properties.

Infer a table schema from a table by analyzing its data and generating field definitions including types, constraints, and formats.

Terminal window
dp schema infer <table-path>

Options:

  • -p, --from-package: Path to package containing the resource
  • -r, --from-resource: Name of resource within package
  • -j, --json: Output as JSON
  • -d, --debug: Enable debug mode

Table Dialect Options:

  • --delimiter: Field delimiter character
  • --header: Whether files have headers
  • --header-rows: Number of header rows
  • --header-join: Join character for multi-row headers
  • --comment-rows: Number of comment rows to skip
  • --comment-char: Comment character
  • --quote-char: Quote character for fields
  • --double-quote: Whether quotes are doubled for escaping
  • --escape-char: Escape character
  • --null-sequence: Sequence representing null values
  • --skip-initial-space: Skip initial whitespace
  • --property: JSON property path for nested data
  • --item-type: Type of items in arrays
  • --item-keys: Keys for object items
  • --sheet-number: Excel sheet number
  • --sheet-name: Excel sheet name
  • --table: Database table name
  • --sample-bytes: Bytes to sample for inference

Table Schema Options:

  • --field-names: Override field names
  • --field-types: Override field types
  • --missing-values: Values to treat as missing
  • --string-format: String format specification
  • --decimal-char: Decimal separator character
  • --group-char: Thousands separator character
  • --bare-number: Allow bare numbers
  • --true-values: Values to treat as true
  • --false-values: Values to treat as false
  • --datetime-format: DateTime format string
  • --date-format: Date format string
  • --time-format: Time format string
  • --array-type: Type of array elements
  • --list-delimiter: List item delimiter
  • --list-item-type: Type of list items
  • --geopoint-format: Geopoint format specification
  • --geojson-format: GeoJSON format specification
  • --sample-rows: Rows to sample for inference
  • --confidence: Confidence threshold for type inference
  • --comma-decimal: Use comma as decimal separator
  • --month-first: Parse dates with month first
  • --keep-strings: Keep string types when possible

Examples:

Terminal window
# Infer schema from CSV file
dp schema infer data.csv
# Infer with custom delimiter and date format
dp schema infer data.csv --delimiter ";" --date-format "%d/%m/%Y"
# Infer from remote file
dp schema infer https://example.com/data.csv
# Infer from resource in package
dp schema infer --from-package datapackage.json --from-resource "users"
# Export schema as JSON
dp schema infer data.csv --json > schema.json

Convert table schemas between different formats, supporting bidirectional conversion between Table Schema and JSONSchema formats.

Terminal window
dp schema convert <descriptor-path>

Options:

  • --format <format>: Source schema format (schema, jsonschema)
  • --to-format <format>: Target schema format (schema, jsonschema)
  • --to-path <path>: Output path for converted schema
  • -j, --json: Output as JSON (automatically enabled when no —to-path)
  • -s, --silent: Suppress all output except errors
  • -d, --debug: Enable debug mode

Supported Formats:

  • schema: Data Package Table Schema format
  • jsonschema: JSON Schema format

Examples:

Terminal window
# Convert Table Schema to JSONSchema
dp schema convert schema.json --to-format jsonschema
# Convert JSONSchema to Table Schema
dp schema convert schema.jsonschema.json --format jsonschema
# Save converted schema to file
dp schema convert schema.json --to-format jsonschema --to-path converted.jsonschema.json
# Convert from JSONSchema and save as Table Schema
dp schema convert input.jsonschema.json --format jsonschema --to-path output.schema.json

Explore a table schema from a local or remote path to view its field definitions and constraints in an interactive format.

Terminal window
dp schema explore <descriptor-path>

Options:

  • -p, --from-package: Path to package containing the resource
  • -r, --from-resource: Name of resource within package
  • -j, --json: Output as JSON
  • -d, --debug: Enable debug mode

Examples:

Terminal window
# Explore schema descriptor
dp schema explore schema.json
# Explore remote schema
dp schema explore https://example.com/schema.json
# Explore schema from package resource
dp schema explore --from-package datapackage.json --from-resource "users"
# Export schema structure as JSON
dp schema explore schema.json --json

Validate a table schema from a local or remote path against the Table Schema specification.

Terminal window
dp schema validate <descriptor-path>

Options:

  • --from-package: Path to package containing the resource
  • --from-resource: Name of resource within package
  • --json: Output validation results as JSON
  • --debug: Enable debug mode
  • -q, --quit: Exit immediately after validation (don’t prompt for error filtering)
  • -a, --all: Skip selection prompts when all can be selected

Examples:

Terminal window
# Validate schema descriptor
dp schema validate schema.json
# Validate remote schema
dp schema validate https://example.com/schema.json
# Validate schema from package resource
dp schema validate --from-package datapackage.json --from-resource "users"
# Get validation results as JSON
dp schema validate schema.json --json
# Interactive selection when no path provided
dp schema validate --from-package datapackage.json

Open an interactive scripting session with a loaded table schema. This provides a REPL environment where you can programmatically interact with the schema definition.

Terminal window
dp schema script <descriptor-path>

Options:

  • -p, --from-package: Path to package containing the resource
  • -r, --from-resource: Name of resource within package
  • -j, --json: Output as JSON
  • -d, --debug: Enable debug mode

Available Variables:

  • dpkit: The dpkit library object
  • schema: The loaded schema object

Examples:

Terminal window
# Start scripting session with schema
dp schema script schema.json
# Script schema from package resource
dp schema script --from-package datapackage.json --from-resource "users"
# In the REPL session:
dp> schema.fields.length
dp> schema.fields[0].name
dp> schema.fields.filter(f => f.type === 'integer')
dp> schema.primaryKey
  1. Infer from data file:

    Terminal window
    dp schema infer data.csv --json > schema.json
  2. Validate the generated schema:

    Terminal window
    dp schema validate schema.json
  3. Explore the schema structure:

    Terminal window
    dp schema explore schema.json
Terminal window
# Convert Table Schema to JSONSchema for JSON Schema validation tools
dp schema infer data.csv --json > table.schema.json
dp schema convert table.schema.json --to-format jsonschema --to-path api.jsonschema.json
# Convert JSONSchema back to Table Schema for dpkit tools
dp schema convert api.jsonschema.json --format jsonschema --to-path converted.schema.json
# Validate the round-trip conversion
dp schema validate converted.schema.json
Terminal window
# Infer schema with high confidence threshold
dp schema infer data.csv --confidence 0.8 --sample-rows 10000
# Validate and explore for refinement
dp schema validate schema.json
dp schema explore schema.json
# Script for custom analysis
dp schema script schema.json
Terminal window
# Validate all schemas in a package interactively
dp schema validate --from-package datapackage.json
# Infer improved schema for specific resource
dp schema infer --from-package datapackage.json --from-resource "transactions"
# Compare schemas using scripting
dp schema script --from-package datapackage.json --from-resource "users"
Terminal window
# Configure specific data types and formats
dp schema infer data.csv \
--datetime-format "%Y-%m-%d %H:%M:%S" \
--true-values "Yes,True,1" \
--false-values "No,False,0" \
--decimal-char "," \
--missing-values "NULL,N/A,,"
Terminal window
# Work with remote schemas
dp schema explore https://example.com/schema.json
dp schema validate https://example.com/schema.json
dp schema infer https://example.com/data.csv

The schema inference supports various field types:

  • Basic Types: string, integer, number, boolean
  • Date/Time Types: date, datetime, time, year, yearmonth, duration
  • Structured Types: array, object, list
  • Geographic Types: geopoint, geojson
Terminal window
# High confidence for clean data
dp schema infer data.csv --confidence 0.9
# Lower confidence for messy data
dp schema infer data.csv --confidence 0.6
Terminal window
# Large sample for better inference
dp schema infer large_data.csv --sample-rows 50000
# Quick inference with small sample
dp schema infer data.csv --sample-rows 100
Terminal window
# European date format
dp schema infer data.csv --date-format "%d.%m.%Y"
# Custom boolean values
dp schema infer data.csv --true-values "Ja,Oui,Sí" --false-values "Nein,Non,No"

All schema commands support multiple output formats:

  • Interactive Display: Default rich terminal interface showing field definitions
  • JSON: Use --json flag for machine-readable output
  • Debug Mode: Use --debug for detailed operation logs

The convert command enables seamless integration with other schema ecosystems:

Terminal window
# Use with JSON Schema validation libraries
dp schema infer data.csv --json > table.schema.json
dp schema convert table.schema.json --to-format jsonschema --to-path validation.jsonschema.json
# Import existing JSONSchema into dpkit workflow
dp schema convert external.jsonschema.json --format jsonschema --to-path dpkit.schema.json
dp table validate data.csv --schema dpkit.schema.json
# Cross-platform schema sharing
dp schema convert schema.json --to-format jsonschema --to-path api-spec.jsonschema.json

Schema commands work seamlessly with other dpkit commands:

Terminal window
# Create schema, then use it for validation
dp schema infer data.csv --json > schema.json
dp table validate data.csv --schema schema.json
# Work within package context
dp package infer *.csv --json > datapackage.json
dp schema validate --from-package datapackage.json --from-resource "data"