llama-gguf
A high-performance Rust implementation of llama.cpp - an LLM inference engine with full GGUF support. Features multiple model architectures (LLaMA, Mistral, Qwen2, TinyLlama, DeepSeek), all K-quant formats, HuggingFace integration, SIMD-optimized CPU inference, and CUDA GPU acceleration. Perfect for building AI-powered legal technology applications with on-premise LLM inference.