Getting Started
Learn how to install and use the ultra-high-performance ARFF Format Converter v2.0 in your projects.
📦 Installation
Using pip (Recommended)
pip install arff-format-converterUsing uv (Fast)
uv add arff-format-converterFor Development
# Clone the repository
git clone https://github.com/Shani-Sinojiya/arff-format-converter.git
cd arff-format-converter
# Using virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -e ".[dev]"
# Or using uv
uv sync🚀 Quick Start
CLI Usage
Basic conversion with the command-line interface:
# Basic conversion
arff-format-converter --file data.arff --output ./output --format csv
# High-performance mode (recommended for production)
arff-format-converter --file data.arff --output ./output --format parquet --fast --parallel
# Benchmark different formats
arff-format-converter --file data.arff --output ./output --benchmark
# Show supported formats and tips
arff-format-converter --infoPython API
from arff_format_converter import ARFFConverter
from pathlib import Path
# Basic usage
converter = ARFFConverter()
output_file = converter.convert(
input_file=Path("data.arff"),
output_dir=Path("output"),
output_format="csv"
)
# High-performance conversion
converter = ARFFConverter(
fast_mode=True, # Skip validation for speed
parallel=True, # Use multiple cores
use_polars=True, # Use Polars for max performance
memory_map=True # Enable memory mapping
)
# Convert with performance optimization
result = converter.convert(
input_file="dataset.arff",
output_file="output/dataset.parquet",
output_format="parquet"
)
print(f"Conversion completed: {result.duration:.2f}s")📊 Supported Formats & Performance
| Format | Extension | Speed | Use Cases | Compression |
|---|---|---|---|---|
| Parquet | .parquet | 🚀 Blazing | Big data, analytics, ML pipelines | 90% |
| ORC | .orc | 🚀 Blazing | Apache ecosystem, Hive, Spark | 85% |
| JSON | .json | ⚡ Ultra Fast | APIs, configuration, web apps | 40% |
| CSV | .csv | ⚡ Ultra Fast | Excel, data analysis, portability | 20% |
| XLSX | .xlsx | 🔄 Fast | Business reports, Excel workflows | 60% |
| XML | .xml | 🔄 Fast | Legacy systems, SOAP, enterprise | 30% |
🏆 Performance Recommendations
- 🥇 Best Overall: Parquet (fastest + highest compression)
- 🥈 Web/APIs: JSON with orjson optimization
- 🥉 Compatibility: CSV for universal support
📈 System Requirements
- Python: 3.10+ (3.11 recommended for best performance)
- Memory: 2GB+ available RAM (4GB+ for large files)
- Storage: SSD recommended for optimal I/O performance
- CPU: Multi-core processor for parallel processing benefits
💡 Next Steps
- • Check out the API Reference for detailed documentation
- • Browse Examples for performance optimization tips
- • Read the FAQ for troubleshooting and best practices