File
File commands help you work with individual files, providing utilities for copying, describing, and validating files of various formats. These commands are useful for file-level operations and diagnostics.
Available Commands
Section titled “Available Commands”dpkit file copy
Section titled “dpkit file copy”Copy a file from one location to another with support for local and remote sources and destinations.
dpkit file copy <source-path> <target-path>Options:
-d, --debug: Enable debug mode
Examples:
# Copy local filedpkit file copy data.csv backup.csv
# Copy remote file to localdpkit file copy https://example.com/data.csv local_data.csv
# Copy to different directorydpkit file copy data.csv ./backup/data_backup.csvdpkit file describe
Section titled “dpkit file describe”Describe a file’s properties including size, format, encoding, and basic metadata information.
dpkit file describe <file-path>Options:
-j, --json: Output as JSON-d, --debug: Enable debug mode
Examples:
# Describe local filedpkit file describe data.csv
# Describe remote filedpkit file describe https://example.com/data.csv
# Get description as JSONdpkit file describe data.csv --json
# Describe various file typesdpkit file describe document.pdfdpkit file describe image.pngdpkit file describe archive.zipdpkit file validate
Section titled “dpkit file validate”Validate a file’s integrity, format compliance, and accessibility.
dpkit file validate <file-path>Options:
--json: Output validation results as JSON-d, --debug: Enable debug mode-q, --quit: Exit immediately after validation (don’t prompt for error filtering)-a, --all: Skip selection prompts when all can be selected
Examples:
# Validate local filedpkit file validate data.csv
# Validate remote filedpkit file validate https://example.com/data.csv
# Get validation results as JSONdpkit file validate data.csv --json
# Validate multiple file typesdpkit file validate document.jsondpkit file validate image.jpgdpkit file validate data.parquetCommon Workflows
Section titled “Common Workflows”File Backup and Migration
Section titled “File Backup and Migration”# Create backup copydpkit file copy important_data.csv backup/important_data_$(date +%Y%m%d).csv
# Validate backup integritydpkit file validate backup/important_data_20240101.csv
# Describe backup propertiesdpkit file describe backup/important_data_20240101.csvRemote File Handling
Section titled “Remote File Handling”# Download and validate remote filedpkit file copy https://example.com/dataset.csv local_dataset.csvdpkit file validate local_dataset.csv
# Describe remote file without downloadingdpkit file describe https://example.com/dataset.csvFile Diagnostics
Section titled “File Diagnostics”# Check file propertiesdpkit file describe suspicious_file.csv
# Validate file integritydpkit file validate suspicious_file.csv
# Get detailed diagnostics as JSONdpkit file describe problematic_file.csv --jsondpkit file validate problematic_file.csv --jsonBatch File Operations
Section titled “Batch File Operations”# Describe multiple filesfor file in *.csv; do echo "Describing $file:" dpkit file describe "$file" echo "---"done
# Validate all files in directoryfor file in data/*.json; do dpkit file validate "$file" --json >> validation_report.jsondoneFile Type Support
Section titled “File Type Support”File commands work with various file formats:
Data Formats
Section titled “Data Formats”- CSV/TSV: Comma and tab-separated values
- JSON: JavaScript Object Notation
- Excel: .xlsx and .xls files
- Parquet: Apache Parquet files
- Arrow: Apache Arrow files
- ODS: OpenDocument Spreadsheet
Archive Formats
Section titled “Archive Formats”- ZIP: Compressed archives
- TAR: Tape archives
- GZ: Gzip compressed files
Document Formats
Section titled “Document Formats”- PDF: Portable Document Format
- XML: Extensible Markup Language
- YAML: YAML Ain’t Markup Language
Image Formats
Section titled “Image Formats”- PNG: Portable Network Graphics
- JPEG/JPG: Joint Photographic Experts Group
- SVG: Scalable Vector Graphics
File Information Retrieved
Section titled “File Information Retrieved”Basic Properties
Section titled “Basic Properties”- Size: File size in bytes
- Format: Detected file format and MIME type
- Encoding: Text encoding (for text files)
- Permissions: File access permissions (local files)
Content Analysis
Section titled “Content Analysis”- Structure: Basic structure analysis for supported formats
- Validity: Format compliance checking
- Metadata: Embedded metadata extraction
Remote File Properties
Section titled “Remote File Properties”- Accessibility: Whether the remote file is accessible
- Headers: HTTP headers for remote files
- Redirects: Information about URL redirections
Error Handling
Section titled “Error Handling”Common Issues and Solutions
Section titled “Common Issues and Solutions”File Not Found
Section titled “File Not Found”dpkit file describe missing_file.csv# Error: File not found# Solution: Check file path and permissionsNetwork Issues (Remote Files)
Section titled “Network Issues (Remote Files)”dpkit file copy https://unreachable.com/data.csv local.csv# Error: Network timeout# Solution: Check URL and network connectivityFormat Recognition
Section titled “Format Recognition”dpkit file describe unknown_format.dat# May show limited information for unknown formats# Solution: Use --debug for more detailsPermission Issues
Section titled “Permission Issues”dpkit file copy protected_file.csv backup.csv# Error: Permission denied# Solution: Check file permissionsAdvanced Usage
Section titled “Advanced Usage”Scripting and Automation
Section titled “Scripting and Automation”#!/bin/bash# File processing script
FILES="*.csv"for file in $FILES; do echo "Processing $file"
# Validate file if dpkit file validate "$file" --json | jq -r '.valid' | grep -q "true"; then echo "✓ $file is valid"
# Create backup dpkit file copy "$file" "backup/${file%.csv}_$(date +%Y%m%d).csv"
# Get file info dpkit file describe "$file" --json > "info/${file%.csv}_info.json" else echo "✗ $file is invalid" fidoneIntegration with Other Commands
Section titled “Integration with Other Commands”# Validate file before processing with table commandsdpkit file validate data.csv && dpkit table explore data.csv
# Describe file and then infer schemadpkit file describe data.csvdpkit schema infer data.csv --json > schema.json
# Copy and then create packagedpkit file copy remote_data.csv local_data.csvdpkit package infer local_data.csv --json > datapackage.jsonMonitoring and Logging
Section titled “Monitoring and Logging”# Create validation logdpkit file validate data.csv --json | jq '{file: "data.csv", valid: .valid, timestamp: now}' >> validation.log
# Monitor file changeswhile true; do dpkit file describe changing_file.csv --json > current_state.json if ! cmp -s current_state.json previous_state.json; then echo "File changed at $(date)" cp current_state.json previous_state.json fi sleep 60doneOutput Formats
Section titled “Output Formats”File commands support multiple output formats:
- Human-readable: Default formatted output for terminal viewing
- JSON: Machine-readable structured output with
--jsonflag - Debug: Detailed diagnostic information with
--debugflag
Best Practices
Section titled “Best Practices”- Validation First: Always validate files before processing them with other commands
- Backup Important Files: Use
copycommand to create backups before modifications - Remote File Handling: Describe remote files before downloading to check size and format
- Error Checking: Use JSON output for programmatic error checking and handling
- Documentation: Use
describeto document file properties for reproducibility
Security Considerations
Section titled “Security Considerations”- Remote file operations follow URL protocols and security restrictions
- Local file operations respect system file permissions
- Validation helps identify potentially corrupted or malicious files
- Debug mode may expose sensitive file system information