May 15, 2026Capsule v0.3.0

Capsule v0.3.0

Improvements

Multi-backend inference support — automatic backend selection across CPU, GPU (Metal/CUDA/Vulkan), and NPU
Streaming token generation with configurable chunk sizes
Zero-copy memory model with shared buffers between runtime and model layers
Expanded GGUF, ONNX, and CoreML format support

Performance Improvements

Inference throughput improved by 32% on GPU backends via zero-copy memory architecture
Reduced model loading time by 45% through optimized weight mapping

Bug Fixes

Fixed GGUF loading crash issue with oversized context windows
Fixed Metal backend thread synchronization race condition

💙 Thanks to our contributors

@user123 — reporting the GGUF loading crash issue
@ml-researcher — testing NPU backend on Snapdragon X Elite