Running local models is good now
Vicki Boykis reports that local LLMs have reached a point where they are “surprisingly good,” citing his experience on a 2022 M2 Mac with Gemma‑4‑26B‑A4B and the newer Gemma‑4‑12B‑QAT. Using tools such as llama.cpp, LM Studio, and Ollama, he demonstrates agentic coding tasks—including refactoring Python notebooks, generating type‑hints, proofreading, writing tests, and bootstrapping a recommendation model—at about 75 % of frontier‑model speed and accuracy. He details a Docker‑based setup with the Pi agent harness, custom models.json, and a compose configuration that isolates execution. While acknowledging remaining limits (inference speed, context size, prompt mismatches), Boykis emphasizes the rapid tooling improvements, introspection capabilities, and the growing viability of local models for development work, though he stops short of declaring them production‑ready.