File
File commands help you work with individual files, providing utilities for copying, describing, and validating files of various formats. These commands are useful for file-level operations and diagnostics.
Available Commands
Section titled “Available Commands”dp file copy
Section titled “dp file copy”Copy a file from one location to another with support for local and remote sources and destinations.
dp file copy <source-path> <target-path>Options:
-d, --debug: Enable debug mode
Examples:
# Copy local filedp file copy data.csv backup.csv
# Copy remote file to localdp file copy https://example.com/data.csv local_data.csv
# Copy to different directorydp file copy data.csv ./backup/data_backup.csvdp file describe
Section titled “dp file describe”Describe a file’s properties including size, format, encoding, and basic metadata information.
dp file describe <file-path>Options:
-j, --json: Output as JSON-d, --debug: Enable debug mode
Examples:
# Describe local filedp file describe data.csv
# Describe remote filedp file describe https://example.com/data.csv
# Get description as JSONdp file describe data.csv --json
# Describe various file typesdp file describe document.pdfdp file describe image.pngdp file describe archive.zipdp file validate
Section titled “dp file validate”Validate a file’s integrity, format compliance, and accessibility.
dp file validate <file-path>Options:
--json: Output validation results as JSON-d, --debug: Enable debug mode-q, --quit: Exit immediately after validation (don’t prompt for error filtering)-a, --all: Skip selection prompts when all can be selected
Examples:
# Validate local filedp file validate data.csv
# Validate remote filedp file validate https://example.com/data.csv
# Get validation results as JSONdp file validate data.csv --json
# Validate multiple file typesdp file validate document.jsondp file validate image.jpgdp file validate data.parquetCommon Workflows
Section titled “Common Workflows”File Backup and Migration
Section titled “File Backup and Migration”# Create backup copydp file copy important_data.csv backup/important_data_$(date +%Y%m%d).csv
# Validate backup integritydp file validate backup/important_data_20240101.csv
# Describe backup propertiesdp file describe backup/important_data_20240101.csvRemote File Handling
Section titled “Remote File Handling”# Download and validate remote filedp file copy https://example.com/dataset.csv local_dataset.csvdp file validate local_dataset.csv
# Describe remote file without downloadingdp file describe https://example.com/dataset.csvFile Diagnostics
Section titled “File Diagnostics”# Check file propertiesdp file describe suspicious_file.csv
# Validate file integritydp file validate suspicious_file.csv
# Get detailed diagnostics as JSONdp file describe problematic_file.csv --jsondp file validate problematic_file.csv --jsonBatch File Operations
Section titled “Batch File Operations”# Describe multiple filesfor file in *.csv; do echo "Describing $file:" dp file describe "$file" echo "---"done
# Validate all files in directoryfor file in data/*.json; do dp file validate "$file" --json >> validation_report.jsondoneFile Type Support
Section titled “File Type Support”File commands work with various file formats:
Data Formats
Section titled “Data Formats”- CSV/TSV: Comma and tab-separated values
- JSON: JavaScript Object Notation
- Excel: .xlsx and .xls files
- Parquet: Apache Parquet files
- Arrow: Apache Arrow files
- ODS: OpenDocument Spreadsheet
Archive Formats
Section titled “Archive Formats”- ZIP: Compressed archives
- TAR: Tape archives
- GZ: Gzip compressed files
Document Formats
Section titled “Document Formats”- PDF: Portable Document Format
- XML: Extensible Markup Language
- YAML: YAML Ain’t Markup Language
Image Formats
Section titled “Image Formats”- PNG: Portable Network Graphics
- JPEG/JPG: Joint Photographic Experts Group
- SVG: Scalable Vector Graphics
File Information Retrieved
Section titled “File Information Retrieved”Basic Properties
Section titled “Basic Properties”- Size: File size in bytes
- Format: Detected file format and MIME type
- Encoding: Text encoding (for text files)
- Permissions: File access permissions (local files)
Content Analysis
Section titled “Content Analysis”- Structure: Basic structure analysis for supported formats
- Validity: Format compliance checking
- Metadata: Embedded metadata extraction
Remote File Properties
Section titled “Remote File Properties”- Accessibility: Whether the remote file is accessible
- Headers: HTTP headers for remote files
- Redirects: Information about URL redirections
Error Handling
Section titled “Error Handling”Common Issues and Solutions
Section titled “Common Issues and Solutions”File Not Found
Section titled “File Not Found”dp file describe missing_file.csv# Error: File not found# Solution: Check file path and permissionsNetwork Issues (Remote Files)
Section titled “Network Issues (Remote Files)”dp file copy https://unreachable.com/data.csv local.csv# Error: Network timeout# Solution: Check URL and network connectivityFormat Recognition
Section titled “Format Recognition”dp file describe unknown_format.dat# May show limited information for unknown formats# Solution: Use --debug for more detailsPermission Issues
Section titled “Permission Issues”dp file copy protected_file.csv backup.csv# Error: Permission denied# Solution: Check file permissionsAdvanced Usage
Section titled “Advanced Usage”Scripting and Automation
Section titled “Scripting and Automation”#!/bin/bash# File processing script
FILES="*.csv"for file in $FILES; do echo "Processing $file"
# Validate file if dp file validate "$file" --json | jq -r '.valid' | grep -q "true"; then echo "✓ $file is valid"
# Create backup dp file copy "$file" "backup/${file%.csv}_$(date +%Y%m%d).csv"
# Get file info dp file describe "$file" --json > "info/${file%.csv}_info.json" else echo "✗ $file is invalid" fidoneIntegration with Other Commands
Section titled “Integration with Other Commands”# Validate file before processing with table commandsdp file validate data.csv && dp table explore data.csv
# Describe file and then infer schemadp file describe data.csvdp schema infer data.csv --json > schema.json
# Copy and then create packagedp file copy remote_data.csv local_data.csvdp package infer local_data.csv --json > datapackage.jsonMonitoring and Logging
Section titled “Monitoring and Logging”# Create validation logdp file validate data.csv --json | jq '{file: "data.csv", valid: .valid, timestamp: now}' >> validation.log
# Monitor file changeswhile true; do dp file describe changing_file.csv --json > current_state.json if ! cmp -s current_state.json previous_state.json; then echo "File changed at $(date)" cp current_state.json previous_state.json fi sleep 60doneOutput Formats
Section titled “Output Formats”File commands support multiple output formats:
- Human-readable: Default formatted output for terminal viewing
- JSON: Machine-readable structured output with
--jsonflag - Debug: Detailed diagnostic information with
--debugflag
Best Practices
Section titled “Best Practices”- Validation First: Always validate files before processing them with other commands
- Backup Important Files: Use
copycommand to create backups before modifications - Remote File Handling: Describe remote files before downloading to check size and format
- Error Checking: Use JSON output for programmatic error checking and handling
- Documentation: Use
describeto document file properties for reproducibility
Security Considerations
Section titled “Security Considerations”- Remote file operations follow URL protocols and security restrictions
- Local file operations respect system file permissions
- Validation helps identify potentially corrupted or malicious files
- Debug mode may expose sensitive file system information