Getting Started

Learn how to install and use the ultra-high-performance ARFF Format Converter v2.0 in your projects.

📦 Installation

Using pip (Recommended)

pip install arff-format-converter

Using uv (Fast)

uv add arff-format-converter

For Development

# Clone the repository
git clone https://github.com/Shani-Sinojiya/arff-format-converter.git
cd arff-format-converter

# Using virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e ".[dev]"

# Or using uv
uv sync

🚀 Quick Start

CLI Usage

Basic conversion with the command-line interface:

# Basic conversion
arff-format-converter --file data.arff --output ./output --format csv

# High-performance mode (recommended for production)
arff-format-converter --file data.arff --output ./output --format parquet --fast --parallel

# Benchmark different formats
arff-format-converter --file data.arff --output ./output --benchmark

# Show supported formats and tips
arff-format-converter --info

Python API

from arff_format_converter import ARFFConverter
from pathlib import Path

# Basic usage
converter = ARFFConverter()
output_file = converter.convert(
    input_file=Path("data.arff"),
    output_dir=Path("output"),
    output_format="csv"
)

# High-performance conversion
converter = ARFFConverter(
    fast_mode=True,      # Skip validation for speed
    parallel=True,       # Use multiple cores
    use_polars=True,     # Use Polars for max performance
    memory_map=True      # Enable memory mapping
)

# Convert with performance optimization
result = converter.convert(
    input_file="dataset.arff",
    output_file="output/dataset.parquet",
    output_format="parquet"
)

print(f"Conversion completed: {result.duration:.2f}s")

📊 Supported Formats & Performance

Format	Extension	Speed	Use Cases	Compression
Parquet	.parquet	🚀 Blazing	Big data, analytics, ML pipelines	90%
ORC	.orc	🚀 Blazing	Apache ecosystem, Hive, Spark	85%
JSON	.json	⚡ Ultra Fast	APIs, configuration, web apps	40%
CSV	.csv	⚡ Ultra Fast	Excel, data analysis, portability	20%
XLSX	.xlsx	🔄 Fast	Business reports, Excel workflows	60%
XML	.xml	🔄 Fast	Legacy systems, SOAP, enterprise	30%

🏆 Performance Recommendations

🥇 Best Overall: Parquet (fastest + highest compression)
🥈 Web/APIs: JSON with orjson optimization
🥉 Compatibility: CSV for universal support

📈 System Requirements

Python: 3.10+ (3.11 recommended for best performance)
Memory: 2GB+ available RAM (4GB+ for large files)
Storage: SSD recommended for optimal I/O performance
CPU: Multi-core processor for parallel processing benefits

💡 Next Steps

• Check out the API Reference for detailed documentation
• Browse Examples for performance optimization tips
• Read the FAQ for troubleshooting and best practices