by fadizg
0
Run offline LLMs with llama.cpp backend: real‑time token streaming, SHA‑256‑verified resumable downloads, chat templates, KV‑cache reuse across turns, and grammar‑constrained generation.