Meta Launches Muse Spark, Its Most Capable AI Yet—But Gemini 3.1 Pro Still Leads the Pack

1 month ago 20

In brief

Meta’s caller Muse Spark marks a displacement to closed, natively multimodal AI with agent-based reasoning.
Meta reports beardown benchmark gains successful wellness and search, but inactive trails Gemini connected halfway reasoning and coding.
Built successful 9 months with acold little compute, this points to a caller efficiency-driven AI strategy.

Meta launched Muse Spark connected Wednesday, marking the archetypal exemplary built by Meta Superintelligence Labs—the squad assembled 9 months agone nether Chief AI Officer Alexandr Wang aft Meta's $14 cardinal Scale AI acquisition. It's unrecorded present astatine meta.ai and the Meta AI app, with a rollout to Facebook, Instagram, and WhatsApp coming successful the adjacent fewer weeks.

This isn't conscionable different chatbot upgrade oregon a caller mentation of Llama. Muse Spark is natively multimodal—it processes images, text, and dependable from the crushed up, alternatively than bolting imaginativeness onto an existing substance model. It comes with ocular chain-of-thought, tool-use support, and thing Meta is calling "Contemplating mode": a setup that runs aggregate AI agents successful parallel to tackle harder problems. That's Meta's reply to the extended reasoning modes from Google’s Gemini Deep Think and OpenAI’s GPT Pro.

“Muse Spark is the archetypal measurement connected our scaling ladder and the archetypal merchandise of a ground-up overhaul of our AI efforts,” Meta wrote successful an authoritative announcement. “To enactment further scaling, we are making strategical investments crossed the full stack—from probe and exemplary grooming to infrastructure, including the Hyperion information center.”

The institution worked with much than 1,000 physicians to curate grooming information for Muse Spark's aesculapian reasoning. The results connected HealthBench Hard—an open-ended wellness queries benchmark—are striking: Muse Spark scored 42.8, compared to 40.1 for GPT 5.4 and conscionable 20.6 for Gemini 3.1 Pro. That's not a marginal difference.

On agentic hunt (DeepSearchQA), Muse Spark besides leads with 74.8, beating Gemini (69.7) and GPT 5.4 (73.6). On CharXiv Reasoning—figure knowing from technological papers—it scored 86.4, the highest crossed the models successful the comparison.

For those into jailbreaking AI, the exemplary was cracked unfastened wrong minutes:

🚰 SYSTEM PROMPT LEAK 🚰

Here's the afloat Muse Spark strategy punctual from Meta!

I noticed @AIatMeta forgot to unfastened root it, truthful I've done them the courtesy 😘

PROMPT:
"""
Who are you?

You are a friendly, intelligent, and agentic AI assistant. You are lukewarm and a spot playful.…

— Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 (@elder_plinius) April 8, 2026

But bully isn’t the aforesaid arsenic great. The wide benchmark representation shows Gemini 3.1 Pro inactive moving up connected astir categories. The spread is astir disposable connected ARC AGI 2, the abstract reasoning puzzle benchmark: Gemini scored 76.5 to Muse Spark's 42.5.

On coding (LiveCodeBench Pro), Gemini's 82.9 outpaces Meta's 80.0. On MMMU Pro—multimodal understanding—Gemini scored 83.9 versus 80.4. Meta's ain blog acknowledges existent show gaps successful long-horizon agentic systems and coding workflows.

There's besides a notable strategical displacement baked into this launch. Muse Spark is simply a closed model—its architecture and weights won't beryllium made public. That's a crisp departure from Llama, which built Meta's estimation successful unfastened AI circles. After Llama 4's underwhelming reception earlier this year, Meta appears to person decided the adjacent section needs to beryllium written differently.

The institution says it hopes to open-source aboriginal versions of Muse, but for present the codification stays wrong Meta. The tech giant’s banal climbed astir 9% connected Wednesday pursuing the announcement, and finished the trading time up 6.5% to a terms of $612.42.

“Contemplating mode” uses parallel cause orchestration to propulsion the model's ceiling higher. In that configuration, Muse Spark deed 58% connected Humanity's Last Exam and 38% connected FrontierScience Research—territory that makes it competitory with the astir susceptible versions of Gemini and GPT, alternatively than their modular releases.

Meta is besides rolling retired a buying adjunct that compares products and links straight to purchases, and plans to bring Muse Spark to Facebook, Instagram, and WhatsApp successful the coming weeks—following the same script implemented since Llama 3, putting it successful beforehand of much than 3.5 cardinal users. A backstage API preview is opening to prime developers.

The exemplary was built successful 9 months, internally codenamed Avocado, with Meta claiming that its caller pretraining stack tin scope the aforesaid capableness level arsenic Llama 4 Maverick utilizing implicit 10 times little compute.

Muse Spark is described internally arsenic a "small and fast" archetypal measurement successful the Muse family. A much susceptible mentation is already successful development.