Capsule

A local inference engine designed for speed, efficiency, and privacy — run AI models directly on your hardware with zero network telemetry.

Local-first execution

Run GGUF, ONNX, and CoreML models natively on CPU, GPU, or NPU backends.

Zero-copy memory model

Shared memory buffers between runtime and model layers eliminate redundant allocations.

Streaming & batching

Token streaming with adaptive batching for throughput-optimized generation.

Capsule

GPU

CPU

NPU

RAM

127 tok/s·0.0ms network

About Capsule

Capsule is the core runtime engine of the Naviorx ecosystem. It executes AI models locally on your hardware — no cloud, no API keys, no latency. Designed for production-grade inference, Capsule supports multiple model formats (GGUF, ONNX, CoreML) and automatically selects the optimal backend for your hardware (CPU, GPU, or NPU). The zero-copy memory architecture eliminates redundant allocations between model layers, delivering throughput that rivals cloud-hosted solutions — entirely on your device.

Features

Multi-backend inference (CPU, GPU via Metal/CUDA/Vulkan, NPU)
GGUF, ONNX, and CoreML format support
Token streaming with configurable chunk sizes
Adaptive batching for throughput optimization
Zero-copy memory architecture
CLI and programmatic API interfaces

System Requirements

OS Support

Windows, macOS

Version

0.1.0

Download Capsule