Getting Started

Learn how to install and use the ultra-high-performance ARFF Format Converter v2.0 in your projects.

📦 Installation

Using pip (Recommended)

pip install arff-format-converter

Using uv (Fast)

uv add arff-format-converter

For Development

# Clone the repository
git clone https://github.com/Shani-Sinojiya/arff-format-converter.git
cd arff-format-converter

# Using virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e ".[dev]"

# Or using uv
uv sync

🚀 Quick Start

CLI Usage

Basic conversion with the command-line interface:

# Basic conversion
arff-format-converter --file data.arff --output ./output --format csv

# High-performance mode (recommended for production)
arff-format-converter --file data.arff --output ./output --format parquet --fast --parallel

# Benchmark different formats
arff-format-converter --file data.arff --output ./output --benchmark

# Show supported formats and tips
arff-format-converter --info

Python API

from arff_format_converter import ARFFConverter
from pathlib import Path

# Basic usage
converter = ARFFConverter()
output_file = converter.convert(
    input_file=Path("data.arff"),
    output_dir=Path("output"),
    output_format="csv"
)

# High-performance conversion
converter = ARFFConverter(
    fast_mode=True,      # Skip validation for speed
    parallel=True,       # Use multiple cores
    use_polars=True,     # Use Polars for max performance
    memory_map=True      # Enable memory mapping
)

# Convert with performance optimization
result = converter.convert(
    input_file="dataset.arff",
    output_file="output/dataset.parquet",
    output_format="parquet"
)

print(f"Conversion completed: {result.duration:.2f}s")

📊 Supported Formats & Performance

FormatExtensionSpeedUse CasesCompression
Parquet.parquet🚀 BlazingBig data, analytics, ML pipelines90%
ORC.orc🚀 BlazingApache ecosystem, Hive, Spark85%
JSON.json⚡ Ultra FastAPIs, configuration, web apps40%
CSV.csv⚡ Ultra FastExcel, data analysis, portability20%
XLSX.xlsx🔄 FastBusiness reports, Excel workflows60%
XML.xml🔄 FastLegacy systems, SOAP, enterprise30%

🏆 Performance Recommendations

  • 🥇 Best Overall: Parquet (fastest + highest compression)
  • 🥈 Web/APIs: JSON with orjson optimization
  • 🥉 Compatibility: CSV for universal support

📈 System Requirements

  • Python: 3.10+ (3.11 recommended for best performance)
  • Memory: 2GB+ available RAM (4GB+ for large files)
  • Storage: SSD recommended for optimal I/O performance
  • CPU: Multi-core processor for parallel processing benefits

💡 Next Steps

  • • Check out the API Reference for detailed documentation
  • • Browse Examples for performance optimization tips
  • • Read the FAQ for troubleshooting and best practices