All Products

Runtime

Capsule

A local inference engine designed for speed, efficiency, and privacy — run AI models directly on your hardware with zero network telemetry.

Local-first execution

Run GGUF, ONNX, and CoreML models natively on CPU, GPU, or NPU backends.

Zero-copy memory model

Shared memory buffers between runtime and model layers eliminate redundant allocations.

Streaming & batching

Token streaming with adaptive batching for throughput-optimized generation.

Capsule
GPU
CPU
NPU
RAM
127 tok/s·0.0ms network

About Capsule

Capsule is the core runtime engine of the Naviorx ecosystem. It executes AI models locally on your hardware — no cloud, no API keys, no latency. Designed for production-grade inference, Capsule supports multiple model formats (GGUF, ONNX, CoreML) and automatically selects the optimal backend for your hardware (CPU, GPU, or NPU). The zero-copy memory architecture eliminates redundant allocations between model layers, delivering throughput that rivals cloud-hosted solutions — entirely on your device.

Features

  • Multi-backend inference (CPU, GPU via Metal/CUDA/Vulkan, NPU)
  • GGUF, ONNX, and CoreML format support
  • Token streaming with configurable chunk sizes
  • Adaptive batching for throughput optimization
  • Zero-copy memory architecture
  • CLI and programmatic API interfaces

System Requirements

OS Support

Windows, macOS

Version

0.1.0

Download Capsule