Schema
Table Schema commands help you work with table schemas - metadata that describes the structure, types, and constraints of tabular data. These commands allow you to infer schema from data, validate schema definitions, and explore schema properties.
Available Commands
Section titled “Available Commands”dp schema infer
Section titled “dp schema infer”Infer a table schema from a table by analyzing its data and generating field definitions including types, constraints, and formats.
dp schema infer <table-path>Options:
-p, --from-package: Path to package containing the resource-r, --from-resource: Name of resource within package-j, --json: Output as JSON-d, --debug: Enable debug mode
Table Dialect Options:
--delimiter: Field delimiter character--header: Whether files have headers--header-rows: Number of header rows--header-join: Join character for multi-row headers--comment-rows: Number of comment rows to skip--comment-char: Comment character--quote-char: Quote character for fields--double-quote: Whether quotes are doubled for escaping--escape-char: Escape character--null-sequence: Sequence representing null values--skip-initial-space: Skip initial whitespace--property: JSON property path for nested data--item-type: Type of items in arrays--item-keys: Keys for object items--sheet-number: Excel sheet number--sheet-name: Excel sheet name--table: Database table name--sample-bytes: Bytes to sample for inference
Table Schema Options:
--field-names: Override field names--field-types: Override field types--missing-values: Values to treat as missing--string-format: String format specification--decimal-char: Decimal separator character--group-char: Thousands separator character--bare-number: Allow bare numbers--true-values: Values to treat as true--false-values: Values to treat as false--datetime-format: DateTime format string--date-format: Date format string--time-format: Time format string--array-type: Type of array elements--list-delimiter: List item delimiter--list-item-type: Type of list items--geopoint-format: Geopoint format specification--geojson-format: GeoJSON format specification--sample-rows: Rows to sample for inference--confidence: Confidence threshold for type inference--comma-decimal: Use comma as decimal separator--month-first: Parse dates with month first--keep-strings: Keep string types when possible
Examples:
# Infer schema from CSV filedp schema infer data.csv
# Infer with custom delimiter and date formatdp schema infer data.csv --delimiter ";" --date-format "%d/%m/%Y"
# Infer from remote filedp schema infer https://example.com/data.csv
# Infer from resource in packagedp schema infer --from-package datapackage.json --from-resource "users"
# Export schema as JSONdp schema infer data.csv --json > schema.jsondp schema convert
Section titled “dp schema convert”Convert table schemas between different formats, supporting bidirectional conversion between Table Schema and JSONSchema formats.
dp schema convert <descriptor-path>Options:
--format <format>: Source schema format (schema,jsonschema)--to-format <format>: Target schema format (schema,jsonschema)--to-path <path>: Output path for converted schema-j, --json: Output as JSON (automatically enabled when no —to-path)-s, --silent: Suppress all output except errors-d, --debug: Enable debug mode
Supported Formats:
schema: Data Package Table Schema formatjsonschema: JSON Schema format
Examples:
# Convert Table Schema to JSONSchemadp schema convert schema.json --to-format jsonschema
# Convert JSONSchema to Table Schemadp schema convert schema.jsonschema.json --format jsonschema
# Save converted schema to filedp schema convert schema.json --to-format jsonschema --to-path converted.jsonschema.json
# Convert from JSONSchema and save as Table Schemadp schema convert input.jsonschema.json --format jsonschema --to-path output.schema.jsondp schema explore
Section titled “dp schema explore”Explore a table schema from a local or remote path to view its field definitions and constraints in an interactive format.
dp schema explore <descriptor-path>Options:
-p, --from-package: Path to package containing the resource-r, --from-resource: Name of resource within package-j, --json: Output as JSON-d, --debug: Enable debug mode
Examples:
# Explore schema descriptordp schema explore schema.json
# Explore remote schemadp schema explore https://example.com/schema.json
# Explore schema from package resourcedp schema explore --from-package datapackage.json --from-resource "users"
# Export schema structure as JSONdp schema explore schema.json --jsondp schema validate
Section titled “dp schema validate”Validate a table schema from a local or remote path against the Table Schema specification.
dp schema validate <descriptor-path>Options:
--from-package: Path to package containing the resource--from-resource: Name of resource within package--json: Output validation results as JSON--debug: Enable debug mode-q, --quit: Exit immediately after validation (don’t prompt for error filtering)-a, --all: Skip selection prompts when all can be selected
Examples:
# Validate schema descriptordp schema validate schema.json
# Validate remote schemadp schema validate https://example.com/schema.json
# Validate schema from package resourcedp schema validate --from-package datapackage.json --from-resource "users"
# Get validation results as JSONdp schema validate schema.json --json
# Interactive selection when no path provideddp schema validate --from-package datapackage.jsondp schema script
Section titled “dp schema script”Open an interactive scripting session with a loaded table schema. This provides a REPL environment where you can programmatically interact with the schema definition.
dp schema script <descriptor-path>Options:
-p, --from-package: Path to package containing the resource-r, --from-resource: Name of resource within package-j, --json: Output as JSON-d, --debug: Enable debug mode
Available Variables:
dpkit: The dpkit library objectschema: The loaded schema object
Examples:
# Start scripting session with schemadp schema script schema.json
# Script schema from package resourcedp schema script --from-package datapackage.json --from-resource "users"
# In the REPL session:dp> schema.fields.lengthdp> schema.fields[0].namedp> schema.fields.filter(f => f.type === 'integer')dp> schema.primaryKeyCommon Workflows
Section titled “Common Workflows”Creating Schema Definitions
Section titled “Creating Schema Definitions”-
Infer from data file:
Terminal window dp schema infer data.csv --json > schema.json -
Validate the generated schema:
Terminal window dp schema validate schema.json -
Explore the schema structure:
Terminal window dp schema explore schema.json
Schema Format Conversion
Section titled “Schema Format Conversion”# Convert Table Schema to JSONSchema for JSON Schema validation toolsdp schema infer data.csv --json > table.schema.jsondp schema convert table.schema.json --to-format jsonschema --to-path api.jsonschema.json
# Convert JSONSchema back to Table Schema for dpkit toolsdp schema convert api.jsonschema.json --format jsonschema --to-path converted.schema.json
# Validate the round-trip conversiondp schema validate converted.schema.jsonSchema Analysis and Refinement
Section titled “Schema Analysis and Refinement”# Infer schema with high confidence thresholddp schema infer data.csv --confidence 0.8 --sample-rows 10000
# Validate and explore for refinementdp schema validate schema.jsondp schema explore schema.json
# Script for custom analysisdp schema script schema.jsonWorking with Package Schemas
Section titled “Working with Package Schemas”# Validate all schemas in a package interactivelydp schema validate --from-package datapackage.json
# Infer improved schema for specific resourcedp schema infer --from-package datapackage.json --from-resource "transactions"
# Compare schemas using scriptingdp schema script --from-package datapackage.json --from-resource "users"Custom Type Inference
Section titled “Custom Type Inference”# Configure specific data types and formatsdp schema infer data.csv \ --datetime-format "%Y-%m-%d %H:%M:%S" \ --true-values "Yes,True,1" \ --false-values "No,False,0" \ --decimal-char "," \ --missing-values "NULL,N/A,,"Remote Schema Handling
Section titled “Remote Schema Handling”# Work with remote schemasdp schema explore https://example.com/schema.jsondp schema validate https://example.com/schema.jsondp schema infer https://example.com/data.csvSchema Field Types
Section titled “Schema Field Types”The schema inference supports various field types:
- Basic Types:
string,integer,number,boolean - Date/Time Types:
date,datetime,time,year,yearmonth,duration - Structured Types:
array,object,list - Geographic Types:
geopoint,geojson
Advanced Inference Options
Section titled “Advanced Inference Options”Confidence Tuning
Section titled “Confidence Tuning”# High confidence for clean datadp schema infer data.csv --confidence 0.9
# Lower confidence for messy datadp schema infer data.csv --confidence 0.6Sample Size Control
Section titled “Sample Size Control”# Large sample for better inferencedp schema infer large_data.csv --sample-rows 50000
# Quick inference with small sampledp schema infer data.csv --sample-rows 100Format Specifications
Section titled “Format Specifications”# European date formatdp schema infer data.csv --date-format "%d.%m.%Y"
# Custom boolean valuesdp schema infer data.csv --true-values "Ja,Oui,Sí" --false-values "Nein,Non,No"Output Formats
Section titled “Output Formats”All schema commands support multiple output formats:
- Interactive Display: Default rich terminal interface showing field definitions
- JSON: Use
--jsonflag for machine-readable output - Debug Mode: Use
--debugfor detailed operation logs
Schema Format Interoperability
Section titled “Schema Format Interoperability”The convert command enables seamless integration with other schema ecosystems:
# Use with JSON Schema validation librariesdp schema infer data.csv --json > table.schema.jsondp schema convert table.schema.json --to-format jsonschema --to-path validation.jsonschema.json
# Import existing JSONSchema into dpkit workflowdp schema convert external.jsonschema.json --format jsonschema --to-path dpkit.schema.jsondp table validate data.csv --schema dpkit.schema.json
# Cross-platform schema sharingdp schema convert schema.json --to-format jsonschema --to-path api-spec.jsonschema.jsonIntegration with Other Commands
Section titled “Integration with Other Commands”Schema commands work seamlessly with other dpkit commands:
# Create schema, then use it for validationdp schema infer data.csv --json > schema.jsondp table validate data.csv --schema schema.json
# Work within package contextdp package infer *.csv --json > datapackage.jsondp schema validate --from-package datapackage.json --from-resource "data"