Runtime
Capsule
A local inference engine designed for speed, efficiency, and privacy — run AI models directly on your hardware with zero network telemetry.
Local-first execution
Run GGUF, ONNX, and CoreML models natively on CPU, GPU, or NPU backends.
Zero-copy memory model
Shared memory buffers between runtime and model layers eliminate redundant allocations.
Streaming & batching
Token streaming with adaptive batching for throughput-optimized generation.
About Capsule
Capsule is the core runtime engine of the Naviorx ecosystem. It executes AI models locally on your hardware — no cloud, no API keys, no latency. Designed for production-grade inference, Capsule supports multiple model formats (GGUF, ONNX, CoreML) and automatically selects the optimal backend for your hardware (CPU, GPU, or NPU). The zero-copy memory architecture eliminates redundant allocations between model layers, delivering throughput that rivals cloud-hosted solutions — entirely on your device.
Features
- Multi-backend inference (CPU, GPU via Metal/CUDA/Vulkan, NPU)
- GGUF, ONNX, and CoreML format support
- Token streaming with configurable chunk sizes
- Adaptive batching for throughput optimization
- Zero-copy memory architecture
- CLI and programmatic API interfaces
System Requirements
OS Support
Windows, macOS
Version
0.1.0