Schema
Table Schema commands help you work with table schemas - metadata that describes the structure, types, and constraints of tabular data. These commands allow you to infer schema from data, validate schema definitions, and explore schema properties.
Available Commands
Section titled “Available Commands”dpkit schema infer
Section titled “dpkit schema infer”Infer a table schema from a table by analyzing its data and generating field definitions including types, constraints, and formats.
dpkit schema infer <table-path>Options:
-p, --from-package: Path to package containing the resource-r, --from-resource: Name of resource within package-j, --json: Output as JSON-d, --debug: Enable debug mode
Table Dialect Options:
--delimiter: Field delimiter character--header: Whether files have headers--header-rows: Number of header rows--header-join: Join character for multi-row headers--comment-rows: Number of comment rows to skip--comment-char: Comment character--quote-char: Quote character for fields--double-quote: Whether quotes are doubled for escaping--escape-char: Escape character--null-sequence: Sequence representing null values--skip-initial-space: Skip initial whitespace--property: JSON property path for nested data--item-type: Type of items in arrays--item-keys: Keys for object items--sheet-number: Excel sheet number--sheet-name: Excel sheet name--table: Database table name--sample-bytes: Bytes to sample for inference
Table Schema Options:
--field-names: Override field names--field-types: Override field types--missing-values: Values to treat as missing--string-format: String format specification--decimal-char: Decimal separator character--group-char: Thousands separator character--bare-number: Allow bare numbers--true-values: Values to treat as true--false-values: Values to treat as false--datetime-format: DateTime format string--date-format: Date format string--time-format: Time format string--array-type: Type of array elements--list-delimiter: List item delimiter--list-item-type: Type of list items--geopoint-format: Geopoint format specification--geojson-format: GeoJSON format specification--sample-rows: Rows to sample for inference--confidence: Confidence threshold for type inference--comma-decimal: Use comma as decimal separator--month-first: Parse dates with month first--keep-strings: Keep string types when possible
Examples:
# Infer schema from CSV filedpkit schema infer data.csv
# Infer with custom delimiter and date formatdpkit schema infer data.csv --delimiter ";" --date-format "%d/%m/%Y"
# Infer from remote filedpkit schema infer https://example.com/data.csv
# Infer from resource in packagedpkit schema infer --from-package datapackage.json --from-resource "users"
# Export schema as JSONdpkit schema infer data.csv --json > schema.jsondpkit schema convert
Section titled “dpkit schema convert”Convert table schemas between different formats, supporting bidirectional conversion between Table Schema and JSONSchema formats.
dpkit schema convert <descriptor-path>Options:
--format <format>: Source schema format (schema,jsonschema)--to-format <format>: Target schema format (schema,jsonschema)--to-path <path>: Output path for converted schema-j, --json: Output as JSON (automatically enabled when no —to-path)-s, --silent: Suppress all output except errors-d, --debug: Enable debug mode
Supported Formats:
schema: Data Package Table Schema formatjsonschema: JSON Schema format
Examples:
# Convert Table Schema to JSONSchemadpkit schema convert schema.json --to-format jsonschema
# Convert JSONSchema to Table Schemadpkit schema convert schema.jsonschema.json --format jsonschema
# Save converted schema to filedpkit schema convert schema.json --to-format jsonschema --to-path converted.jsonschema.json
# Convert from JSONSchema and save as Table Schemadpkit schema convert input.jsonschema.json --format jsonschema --to-path output.schema.jsondpkit schema explore
Section titled “dpkit schema explore”Explore a table schema from a local or remote path to view its field definitions and constraints in an interactive format.
dpkit schema explore <descriptor-path>Options:
-p, --from-package: Path to package containing the resource-r, --from-resource: Name of resource within package-j, --json: Output as JSON-d, --debug: Enable debug mode
Examples:
# Explore schema descriptordpkit schema explore schema.json
# Explore remote schemadpkit schema explore https://example.com/schema.json
# Explore schema from package resourcedpkit schema explore --from-package datapackage.json --from-resource "users"
# Export schema structure as JSONdpkit schema explore schema.json --jsondpkit schema validate
Section titled “dpkit schema validate”Validate a table schema from a local or remote path against the Table Schema specification.
dpkit schema validate <descriptor-path>Options:
--from-package: Path to package containing the resource--from-resource: Name of resource within package--json: Output validation results as JSON--debug: Enable debug mode-q, --quit: Exit immediately after validation (don’t prompt for error filtering)-a, --all: Skip selection prompts when all can be selected
Examples:
# Validate schema descriptordpkit schema validate schema.json
# Validate remote schemadpkit schema validate https://example.com/schema.json
# Validate schema from package resourcedpkit schema validate --from-package datapackage.json --from-resource "users"
# Get validation results as JSONdpkit schema validate schema.json --json
# Interactive selection when no path provideddpkit schema validate --from-package datapackage.jsondpkit schema script
Section titled “dpkit schema script”Open an interactive scripting session with a loaded table schema. This provides a REPL environment where you can programmatically interact with the schema definition.
dpkit schema script <descriptor-path>Options:
-p, --from-package: Path to package containing the resource-r, --from-resource: Name of resource within package-j, --json: Output as JSON-d, --debug: Enable debug mode
Available Variables:
dpkit: The dpkit library objectschema: The loaded schema object
Examples:
# Start scripting session with schemadpkit schema script schema.json
# Script schema from package resourcedpkit schema script --from-package datapackage.json --from-resource "users"
# In the REPL session:dpkit> schema.fields.lengthdpkit> schema.fields[0].namedpkit> schema.fields.filter(f => f.type === 'integer')dpkit> schema.primaryKeyCommon Workflows
Section titled “Common Workflows”Creating Schema Definitions
Section titled “Creating Schema Definitions”-
Infer from data file:
Terminal window dpkit schema infer data.csv --json > schema.json -
Validate the generated schema:
Terminal window dpkit schema validate schema.json -
Explore the schema structure:
Terminal window dpkit schema explore schema.json
Schema Format Conversion
Section titled “Schema Format Conversion”# Convert Table Schema to JSONSchema for JSON Schema validation toolsdpkit schema infer data.csv --json > table.schema.jsondpkit schema convert table.schema.json --to-format jsonschema --to-path api.jsonschema.json
# Convert JSONSchema back to Table Schema for dpkit toolsdpkit schema convert api.jsonschema.json --format jsonschema --to-path converted.schema.json
# Validate the round-trip conversiondpkit schema validate converted.schema.jsonSchema Analysis and Refinement
Section titled “Schema Analysis and Refinement”# Infer schema with high confidence thresholddpkit schema infer data.csv --confidence 0.8 --sample-rows 10000
# Validate and explore for refinementdpkit schema validate schema.jsondpkit schema explore schema.json
# Script for custom analysisdpkit schema script schema.jsonWorking with Package Schemas
Section titled “Working with Package Schemas”# Validate all schemas in a package interactivelydpkit schema validate --from-package datapackage.json
# Infer improved schema for specific resourcedpkit schema infer --from-package datapackage.json --from-resource "transactions"
# Compare schemas using scriptingdpkit schema script --from-package datapackage.json --from-resource "users"Custom Type Inference
Section titled “Custom Type Inference”# Configure specific data types and formatsdpkit schema infer data.csv \ --datetime-format "%Y-%m-%d %H:%M:%S" \ --true-values "Yes,True,1" \ --false-values "No,False,0" \ --decimal-char "," \ --missing-values "NULL,N/A,,"Remote Schema Handling
Section titled “Remote Schema Handling”# Work with remote schemasdpkit schema explore https://example.com/schema.jsondpkit schema validate https://example.com/schema.jsondpkit schema infer https://example.com/data.csvSchema Field Types
Section titled “Schema Field Types”The schema inference supports various field types:
- Basic Types:
string,integer,number,boolean - Date/Time Types:
date,datetime,time,year,yearmonth,duration - Structured Types:
array,object,list - Geographic Types:
geopoint,geojson
Advanced Inference Options
Section titled “Advanced Inference Options”Confidence Tuning
Section titled “Confidence Tuning”# High confidence for clean datadpkit schema infer data.csv --confidence 0.9
# Lower confidence for messy datadpkit schema infer data.csv --confidence 0.6Sample Size Control
Section titled “Sample Size Control”# Large sample for better inferencedpkit schema infer large_data.csv --sample-rows 50000
# Quick inference with small sampledpkit schema infer data.csv --sample-rows 100Format Specifications
Section titled “Format Specifications”# European date formatdpkit schema infer data.csv --date-format "%d.%m.%Y"
# Custom boolean valuesdpkit schema infer data.csv --true-values "Ja,Oui,Sí" --false-values "Nein,Non,No"Output Formats
Section titled “Output Formats”All schema commands support multiple output formats:
- Interactive Display: Default rich terminal interface showing field definitions
- JSON: Use
--jsonflag for machine-readable output - Debug Mode: Use
--debugfor detailed operation logs
Schema Format Interoperability
Section titled “Schema Format Interoperability”The convert command enables seamless integration with other schema ecosystems:
# Use with JSON Schema validation librariesdpkit schema infer data.csv --json > table.schema.jsondpkit schema convert table.schema.json --to-format jsonschema --to-path validation.jsonschema.json
# Import existing JSONSchema into dpkit workflowdpkit schema convert external.jsonschema.json --format jsonschema --to-path dpkit.schema.jsondpkit table validate data.csv --schema dpkit.schema.json
# Cross-platform schema sharingdpkit schema convert schema.json --to-format jsonschema --to-path api-spec.jsonschema.jsonIntegration with Other Commands
Section titled “Integration with Other Commands”Schema commands work seamlessly with other dpkit commands:
# Create schema, then use it for validationdpkit schema infer data.csv --json > schema.jsondpkit table validate data.csv --schema schema.json
# Work within package contextdpkit package infer *.csv --json > datapackage.jsondpkit schema validate --from-package datapackage.json --from-resource "data"