All Updates
Capsule v0.3.0

Capsule v0.3.0

Improvements

  • Multi-backend inference support — automatic backend selection across CPU, GPU (Metal/CUDA/Vulkan), and NPU
  • Streaming token generation with configurable chunk sizes
  • Zero-copy memory model with shared buffers between runtime and model layers
  • Expanded GGUF, ONNX, and CoreML format support

Performance Improvements

  • Inference throughput improved by 32% on GPU backends via zero-copy memory architecture
  • Reduced model loading time by 45% through optimized weight mapping

Bug Fixes

  • Fixed GGUF loading crash issue with oversized context windows
  • Fixed Metal backend thread synchronization race condition

💙 Thanks to our contributors

  • @user123 — reporting the GGUF loading crash issue
  • @ml-researcher — testing NPU backend on Snapdragon X Elite