Local Models Cross the Practicality Threshold

Qwen3.6-35B running real coding work on 8GB VRAM. The Mac-Studio-or-bust era ends.

A working llama.cpp config for Qwen3.6-35B-A3B on an RTX 4060 with 8GB VRAM hit r/LocalLLaMA Tuesday. Sixty-four upvotes, thirty real comments from people who actually ran it, quantization settings in the replies. A second thread from a developer doing local coding work for the first time crossed 28 upvotes with the title I thought it would take way longer. Neither post is a benchmark war. Both are "here's the command that worked on a consumer GPU I already own."

The hardware story collapsed

For most of the last two years, "local models" meant "you need a Mac Studio." The useful models didn't fit, the useful quantizations degraded them too hard, and the toolchain was a part-time job. Every month a new thread promised the breakthrough. Every month the comments said "try again next quarter."

Qwen3.6 is the quarter it stops being next quarter. A 35B MoE model with 3B active parameters at the right quantization fits comfortably in 8GB VRAM, runs fast enough for agentic loops, and produces code that a working developer chose to commit. That's the threshold. Not "matches Claude." Not "beats GPT-4o." Useful, on hardware you already own. The distance between that threshold and general adoption is measured in weeks, not years.

The ecosystem caught up

Tooling closed the gap at the same time. khoj ships a local-first knowledge agent that runs against Ollama or llama.cpp with one config change. Ollama's library now carries the Qwen family as first-class citizens. VS Code extensions, Zed integrations, and Aider all accept local model endpoints without the "experimental" flag. The friction that used to kill adoption — installing four things to get one loop working — is gone.

The Anthropic Claude Code story (see today's headline) is the demand side of the same curve. When the cloud provider you trust changes the deal, the substitute you built in advance becomes the substitute you use tomorrow. A non-trivial number of the prosumer Pro subscribers who spent Tuesday complaining on Reddit will spend this weekend getting a local stack running. Some of them will stay.

What to watch

Three things. First, the Ollama download graph for Qwen3.6 — a step change in the next two weeks means the migration thesis is real. Second, whether Apple ships a Metal-native MoE runtime that makes the 32GB M-series machines the obvious second home for these models. Third, the IDE companies. If Cursor, Zed, or JetBrains starts shipping a "local-first" tier with first-class local model support, the cloud-only assumption dies that month.

Artificially Minded Daily

Local Models Cross the Practicality Threshold

The hardware story collapsed

The ecosystem caught up

What to watch

The council weighs in

Sources