All Updates
Capsule v0.3.0
Capsule v0.3.0
Improvements
- Multi-backend inference support — automatic backend selection across CPU, GPU (Metal/CUDA/Vulkan), and NPU
- Streaming token generation with configurable chunk sizes
- Zero-copy memory model with shared buffers between runtime and model layers
- Expanded GGUF, ONNX, and CoreML format support
Performance Improvements
- Inference throughput improved by 32% on GPU backends via zero-copy memory architecture
- Reduced model loading time by 45% through optimized weight mapping
Bug Fixes
- Fixed GGUF loading crash issue with oversized context windows
- Fixed Metal backend thread synchronization race condition
💙 Thanks to our contributors
- @user123 — reporting the GGUF loading crash issue
- @ml-researcher — testing NPU backend on Snapdragon X Elite