Dialect
Table Dialect commands help you work with CSV dialects - metadata that describes how to parse CSV and similar tabular text files. These commands allow you to infer parsing parameters from files, validate dialect definitions, and explore dialect properties.
Available Commands
Section titled “Available Commands”dp dialect infer
Section titled “dp dialect infer”Infer a table dialect from a table by analyzing its structure and determining the best parsing parameters such as delimiter, quote character, and header configuration.
dp dialect infer <table-path>Options:
-p, --from-package: Path to package containing the resource-r, --from-resource: Name of resource within package-j, --json: Output as JSON-d, --debug: Enable debug mode--sample-bytes: Number of bytes to sample for dialect inference
Examples:
# Infer dialect from CSV filedp dialect infer data.csv
# Infer from remote filedp dialect infer https://example.com/data.csv
# Infer from resource in packagedp dialect infer --from-package datapackage.json --from-resource "users"
# Export dialect as JSONdp dialect infer data.csv --json > dialect.json
# Use larger sample for complex filesdp dialect infer complex_data.csv --sample-bytes 8192dp dialect explore
Section titled “dp dialect explore”Explore a table dialect from a local or remote path to view its parsing configuration in an interactive format.
dp dialect explore <descriptor-path>Options:
-p, --from-package: Path to package containing the resource-r, --from-resource: Name of resource within package-j, --json: Output as JSON-d, --debug: Enable debug mode
Examples:
# Explore dialect descriptordp dialect explore dialect.json
# Explore remote dialectdp dialect explore https://example.com/dialect.json
# Explore dialect from package resourcedp dialect explore --from-package datapackage.json --from-resource "users"
# Export dialect structure as JSONdp dialect explore dialect.json --jsondp dialect validate
Section titled “dp dialect validate”Validate a table dialect from a local or remote path against the CSV Dialect specification.
dp dialect validate <descriptor-path>Options:
-p, --from-package: Path to package containing the resource-r, --from-resource: Name of resource within package-j, --json: Output validation results as JSON-d, --debug: Enable debug mode-q, --quit: Exit immediately after validation (don’t prompt for error filtering)-a, --all: Skip selection prompts when all can be selected
Examples:
# Validate dialect descriptordp dialect validate dialect.json
# Validate remote dialectdp dialect validate https://example.com/dialect.json
# Validate dialect from package resourcedp dialect validate --from-package datapackage.json --from-resource "users"
# Get validation results as JSONdp dialect validate dialect.json --json
# Interactive selection when no path provideddp dialect validate --from-package datapackage.jsondp dialect script
Section titled “dp dialect script”Open an interactive scripting session with a loaded table dialect. This provides a REPL environment where you can programmatically interact with the dialect definition.
dp dialect script <descriptor-path>Options:
-p, --from-package: Path to package containing the resource-r, --from-resource: Name of resource within package-j, --json: Output as JSON-d, --debug: Enable debug mode
Available Variables:
dpkit: The dpkit library objectdialect: The loaded dialect object
Examples:
# Start scripting session with dialectdp dialect script dialect.json
# Script dialect from package resourcedp dialect script --from-package datapackage.json --from-resource "users"
# In the REPL session:dp> dialect.delimiterdp> dialect.quoteChardp> dialect.headerdp> dialect.skipInitialSpaceCommon Workflows
Section titled “Common Workflows”Creating Dialect Definitions
Section titled “Creating Dialect Definitions”-
Infer from data file:
Terminal window dp dialect infer data.csv --json > dialect.json -
Validate the generated dialect:
Terminal window dp dialect validate dialect.json -
Explore the dialect configuration:
Terminal window dp dialect explore dialect.json
Dialect Analysis for Complex Files
Section titled “Dialect Analysis for Complex Files”# Infer dialect with larger sample for better accuracydp dialect infer complex_file.csv --sample-bytes 16384
# Validate and explore for verificationdp dialect validate dialect.jsondp dialect explore dialect.json
# Script for custom dialect analysisdp dialect script dialect.jsonWorking with Package Dialects
Section titled “Working with Package Dialects”# Validate all dialects in a package interactivelydp dialect validate --from-package datapackage.json
# Infer improved dialect for specific resourcedp dialect infer --from-package datapackage.json --from-resource "transactions"
# Compare dialects using scriptingdp dialect script --from-package datapackage.json --from-resource "users"Remote Dialect Handling
Section titled “Remote Dialect Handling”# Work with remote dialectsdp dialect explore https://example.com/dialect.jsondp dialect validate https://example.com/dialect.jsondp dialect infer https://example.com/data.csvDialect Properties
Section titled “Dialect Properties”CSV Dialect specifications typically include:
Core Properties
Section titled “Core Properties”- delimiter: Field separator character (e.g.,
,,;,\t) - quoteChar: Character used to quote fields (e.g.,
",') - escapeChar: Character used to escape quotes within fields
- doubleQuote: Whether quotes are escaped by doubling them
Header Configuration
Section titled “Header Configuration”- header: Whether the first row contains headers
- headerRows: Number of header rows
- headerJoin: Character used to join multi-row headers
Whitespace Handling
Section titled “Whitespace Handling”- skipInitialSpace: Whether to skip whitespace after delimiters
- nullSequence: Sequence representing null values
Comment Handling
Section titled “Comment Handling”- commentRows: Number of comment rows to skip
- commentChar: Character indicating comment lines
Common Dialect Patterns
Section titled “Common Dialect Patterns”Standard CSV
Section titled “Standard CSV”{ "delimiter": ",", "quoteChar": "\"", "doubleQuote": true, "header": true}European CSV (semicolon-separated)
Section titled “European CSV (semicolon-separated)”{ "delimiter": ";", "quoteChar": "\"", "doubleQuote": true, "header": true}Tab-separated values
Section titled “Tab-separated values”{ "delimiter": "\t", "quoteChar": "\"", "doubleQuote": true, "header": true}Custom formats with comments
Section titled “Custom formats with comments”{ "delimiter": "|", "quoteChar": "'", "header": true, "commentRows": 3, "commentChar": "#"}Troubleshooting Dialect Inference
Section titled “Troubleshooting Dialect Inference”For files with unusual formatting:
Section titled “For files with unusual formatting:”# Use larger sample sizedp dialect infer unusual_file.csv --sample-bytes 32768
# Check inferred dialectdp dialect explore dialect.json
# Manually verify with table commandsdp table explore unusual_file.csv --dialect dialect.jsonFor files with multiple header rows:
Section titled “For files with multiple header rows:”# The dialect inference will detect headerRows automaticallydp dialect infer multi_header.csv --json
# Verify the header configurationdp dialect script dialect.json# Then in REPL: dialect.headerRowsOutput Formats
Section titled “Output Formats”All dialect commands support multiple output formats:
- Interactive Display: Default rich terminal interface showing dialect properties
- JSON: Use
--jsonflag for machine-readable output - Debug Mode: Use
--debugfor detailed operation logs
Integration with Other Commands
Section titled “Integration with Other Commands”Dialect commands work seamlessly with other dpkit commands:
# Create dialect, then use it for table operationsdp dialect infer data.csv --json > dialect.jsondp table validate data.csv --dialect dialect.json
# Work within package contextdp package infer *.csv --json > datapackage.jsondp dialect validate --from-package datapackage.json --from-resource "data"
# Use inferred dialect for schema inferencedp dialect infer data.csv --json > dialect.jsondp schema infer data.csv --delimiter ";" --header-rows 2Best Practices
Section titled “Best Practices”- Sample Size: Use larger
--sample-bytesfor files with complex or inconsistent formatting - Validation: Always validate inferred dialects before using them in production
- Testing: Test dialect definitions with actual data using table commands
- Documentation: Include dialect files alongside data files for reproducibility