Google's Gemma Already Acts Like Gemini—Someone Made It Think Like Claude Opus Too

1 month ago 18

If you've been pursuing the section AI scene, you astir apt cognize Qwopus—the open-source exemplary that tried to distill Claude Opus 4.6's reasoning into Alibaba's Qwen, truthful you could tally thing resembling Opus connected your ain hardware for free. It worked amazingly well. The evident catch: Qwen is simply a Chinese model, and not everyone is comfy with that.

Jackrong, the aforesaid pseudonymous developer down that project, heard the feedback. His reply is Gemopus—a caller household of Claude Opus-style fine-tunes built wholly connected Google's open-source Gemma 4. All-American DNA, aforesaid idea: frontier-level reasoning, moving locally connected hardware you already own.

The household comes successful 2 flavors. Gemopus-4-26B-A4B is the heavier option—a Mixture of Experts exemplary that has 26 cardinal full parameters but lone activates astir 4 cardinal during inference, which means it punches good supra its value connected constrained hardware.

Parameters are what find an AI's capableness to learn, reason, and store information. Having 26 cardinal full parameters gives the exemplary a immense breadth of knowledge. But by lone "waking up" the 4 cardinal parameters applicable to your circumstantial prompt, it delivers the high-quality results of a monolithic AI portion remaining lightweight capable to tally smoothly connected mundane hardware.

The different is Gemopus-4-E4B, a 4-billion parameter borderline exemplary engineered to tally comfortably connected a modern iPhone oregon a thin-and-light MacBook—no GPU required.

The basal exemplary prime matters here. Google's Gemma 4, released connected April 2, is built straight from the aforesaid probe and exertion arsenic Gemini 3—the institution said truthful explicitly astatine launch. That means Gemopus carries thing nary Qwen-based fine-tune tin claim: The DNA of Google's ain state-of-the-art closed exemplary nether the hood, wrapped successful Anthropic's reasoning benignant connected top. The champion of some worlds, much oregon less.

What makes Gemopus antithetic from the question of different Gemma fine-tunes flooding Hugging Face close present is the doctrine down it. Jackrong deliberately chose not to unit Claude's chain-of-thought reasoning traces into Gemma's weights—a shortcut astir competing releases take.

His argument, backed by caller research, is that stuffing a pupil exemplary with a teacher's surface-level reasoning substance doesn't really transportation existent reasoning ability. It teaches imitation, not logic. "There is nary request for excessive imaginativeness oregon superstitious replication of the Claude-style concatenation of thought," the exemplary paper reads. Instead, helium focused connected reply quality, structural clarity, and conversational naturalness—fixing Gemma's stiff Wikipedia code and its inclination to lecture you astir things you didn't ask.

AI infrastructure technologist Kyle Hessling ran autarkic benchmarks and published the results straight connected the exemplary card. His verdict connected the 26B variant was beauteous favorable. "Happy to person benched this 1 beauteous hard and it is an fantabulous finetune of an already exceptional model,” helium wrote connected X. “It rocks astatine one-shot requests implicit agelong contexts, and runs incredibly accelerated acknowledgment to the MOE (mixture of experts) architecture."

Gemopus-4-26B-A4B from Jackrong is LIVE!

Happy to person benched this 1 beauteous hard (see my benches successful the exemplary card) and it is an fantabulous finetune of an already exceptional model! My person Jackrong is ever cooking the greatest!

It rocks astatine one-shot requests implicit long…

— Kyle Hessling (@KyleHessling1) April 10, 2026

The smaller E4B variant passed each 14 halfway competence tests—instruction following, coding, math, multi-step reasoning, translation, safety, caching—and cleared each 12 long-context tests astatine 30K and 60K tokens. On needle-in-haystack retrieval, it passed 13 retired of 13 probes including a agelong trial astatine 1 cardinal tokens with YaRN 8× RoPE scaling.

The 26B extends natively to 131K discourse and each the mode retired to 524K with YaRN, which Hessling besides stress-tested: "It besides crushed my elemental needle-in-the-haystack tests each the mode retired to an extended discourse of 524k!"

On borderline hardware, the E4B is genuinely fast. Jackrong reports 45–60 tokens per 2nd connected iPhone 17 Pro Max, and 90–120 tokens per 2nd connected MacBook Air M3/M4 via MLX. The 26B MoE architecture means it offloads gracefully connected unified representation systems oregon GPUs with nether 10GB of VRAM. Hessling called it his regular operator proposal for VRAM-starved setups.

Both models are disposable successful GGUF format, which means you tin driblet them consecutive into LM Studio oregon llama.cpp without configuration. The afloat grooming codification and a step-by-step fine-tuning usher are connected Jackrong's GitHub—same pipeline helium utilized for Qwopus, aforesaid Unsloth and LoRA setup, reproducible connected Colab.

Gemopus is not without its unsmooth edges. Tool calling remains breached crossed the full Gemma 4 bid successful llama.cpp and LM Studio—call failures, format mismatches, loops—so if your workflow depends connected agents utilizing outer tools, this is not your exemplary yet. Jackrong himself calls it "an engineering exploration notation alternatively than a afloat production-ready solution," and recommends his ain Qwopus 3.5 bid for anyone who needs thing much unchangeable for existent workloads.

And due to the fact that Jackrong deliberately avoided assertive Claude-style chain-of-thought distillation, don't expect it to consciousness arsenic profoundly Opus-brained arsenic Qwopus—that was a conscious tradeoff for stability, not an oversight.

Yeah the doctrine connected this 1 was stableness first, it is my knowing that the Gemma models thin to go unstable if you unit a clump of Claude reasoning traces into them, you tin spot this erstwhile investigating galore different Opus gemma good tunes connected hugging face.

Jackrong tried a…

— Kyle Hessling (@KyleHessling1) April 10, 2026

For those who privation to spell deeper into Gemma fine-tuning for reasoning specifically, determination is besides a abstracted assemblage task worthy watching: Ornstein by pseudonmyous developer DJLougen, which takes the aforesaid 26B Gemma 4 basal and focuses specifically connected improving its reasoning chains without relying connected the logic oregon benignant of immoderate circumstantial 3rd enactment model.

One honorable caveat: Gemma's grooming dynamics are messier than Qwen's for fine-tuners—wider nonaccomplishment fluctuations, much hyperparameter sensitivity. Jackrong says truthful himself. If you request a much battle-tested section exemplary for accumulation workflows, his Qwopus 3.5 bid remains much robustly validated. But if you privation an American exemplary with Opus-style polish, Gemopus is presently your champion disposable option. A denser 31B Gemopus variant is besides successful the pipeline, with Hessling teasing it arsenic "a banger for sure."

If you privation to effort moving section models connected your ain hardware, cheque our usher connected how to get started with section AI.