This Frankenstein AI Merges Claude Opus, GLM and Qwen—And Outperforms Top Models

3 weeks ago 13

In brief

AI technologist Kyle Hessling merged 2 of Jackrong's Claude Opus 4.6 and GLM-5.1 distilled finetunes into a azygous "frankenmerge."
A post-merge "heal fine-tune" was required to hole garbled codification output caused by the furniture bound betwixt the 2 independently-trained models.
The exemplary over-reasons connected immoderate tasks, but it's a solvable problem.

You thought Qwopus was chill due to the fact that it merged Qwen and Opus? Well, Kyle Hessling, an AI technologist with a batch of cognition and escaped clip conscionable took that look and threw GLM—one of the champion reasoning models retired there—into the mix. The effect is an 18 cardinal parameter frankenmerge that fits connected a inexpensive GPU and outperforms Alibaba's newest 35B model.

For those who don't know, parameters are the numerical values baked into a neural web during training, similar dials that a neural web tin set — the much of them, the much cognition and complexity the exemplary tin handle, and the much representation it needs to run.

Hessling, an AI infrastructure engineer, stacked 2 of Jackrong's Qwen3.5 finetunes connected apical of each other: layers 0 done 31 from Qwopus 3.5-9B-v3.5, which distills Claude 4.6 Opus's reasoning benignant into Qwen arsenic a basal model, and layers 32 done 63 from Qwen 3.5-9B-GLM5.1-Distill-v1, trained connected reasoning information from z.AI's GLM-5.1 teacher exemplary connected apical of the aforesaid Qwen base.

The hypothesis: Give the exemplary Opus-style structured readying successful the archetypal fractional of the reasoning and GLM's occupation decomposition scaffold successful the second—64 layers total, successful 1 model.

The method is called a passthrough frankenmerge—no blending, nary averaging of weights, conscionable earthy furniture stacking. Hessling had to constitute his ain merge publication from scratch due to the fact that existing tools don't enactment Qwen 3.5's hybrid linear/full attraction architecture. The resulting model passed 40 retired of 44 capableness tests, beating Alibaba's Qwen 3.6-35B-A3B MoE—which requires 22 GB of VRAM—while moving connected conscionable 9.2 GB successful Q4_K_M quantization.

An NVIDIA RTX 3060 handles it fine… theoretically.

Hessling explains that making this exemplary wasn’t easy. The earthy merge utilized to propulsion garbled code. But adjacent so, the trial models helium published went benignant of viral among enthusiasts.

Hessling's last hole was a "heal fine-tune"—basically a QLoRA (a spot of codification that is embedded into the exemplary similar an appendix and heavy conditions the last output) targeting each attraction and projections.

We tried it, and adjacent though the thought of having Qwen, Claude Opus, and GLM 5.1 moving locally successful our murphy is beyond tempting, successful world we recovered that the exemplary is truthful bully astatine reasoning done things that it ends up overthinking.

When tested it connected an M1 MacBook moving an MLX quantized mentation (a exemplary optimized to tally connected Macs). When prompted to make our accustomed trial game, the reasoning concatenation ran truthful agelong it deed the token bounds and gave america a bully agelong portion of reasoning without a moving effect successful a zero changeable interaction. That's a daily-use blocker for anyone wanting to tally this locally connected user hardware for immoderate superior application.

We went a spot softer and things inactive were challenging. A elemental "write a Snake game" punctual took implicit 40 minutes successful reasoning... tons of it.

You tin spot the results successful our Github repository.

This is simply a known hostility successful the Qwopus lineage: Jackrong's v2 finetunes were built to code Qwen 3.5's inclination toward repetitive interior loops and "think much economically." Stacking 64 layers of 2 reasoning distills appears to amplify that behaviour connected definite prompts.

That's a solvable problem, and the open-source assemblage volition apt lick it. What matters present is the broader pattern: a pseudonymous developer publishes specialized finetunes with afloat grooming guides, different enthusiast stacks them with a customized script, runs 1,000 healing steps, and lands a exemplary that outperforms a 35 cardinal parameter merchandise from 1 of the world's largest AI labs. The full happening fits successful a tiny file.

This is what makes open-source worthy watching—not conscionable the large labs releasing weights, but the layer-by-layer solutions, the specialization happening beneath the radar. The spread betwixt a play task and a frontier deployment is narrower the much developers articulation the community.

Jackrong has since mirrored Hessling's repository, and the exemplary had accumulated implicit 3 1000 downloads wrong its archetypal 2 weeks of availability.