Microsoft Made GPT and Claude Work Together—And the Result Beats Every AI Research Tool Out There

1 month ago 26

In brief

Microsoft released 2 antithetic modes that brace GPT and Claude to summation the prime of AI research.
Critique makes the models collaborate, whereas Council makes them enactment successful parallel portion a 3rd justice finds the discrepancies.
This two-model workflow fixes hallucinations, anemic citations, and different problems associated with mono-model AI research.

Deep probe AI has been 1 of the hottest arms races successful tech this year. Google announced its probe cause for Gemini successful December 2024, OpenAI released its ain probe cause successful February 2025, xAI followed suit, Perplexity doubled down, and Anthropic's Claude built a loyal pursuing among professionals who request detailed, cited answers, introducing its cause successful April of past year.

Every institution has been trying to person you that their azygous AI exemplary is the smartest researcher successful the room. Microsoft conscionable said: Why prime one?

The institution announced 2 caller features connected Monday for Copilot's Researcher tool—called Critique and Council—that enactment OpenAI's GPT and Anthropic's Claude to enactment connected the aforesaid probe task successful sequence. The result, according to Microsoft's investigating against an manufacture benchmark, scores higher than each strategy included successful that test, including models from the apical AI companies.

Introducing Critique, a caller multi-model heavy probe strategy successful M365 Copilot.

You tin usage aggregate models unneurotic to make optimal responses and reports. pic.twitter.com/m4RlQmCKzs

— Satya Nadella (@satyanadella) March 30, 2026

“Critique is simply a caller multi exemplary heavy probe strategy designed for analyzable probe tasks. It separates procreation from valuation and utilizes a operation of models from Frontier labs, including Anthropic and OpenAI,” Microsoft explains. “One exemplary leads the procreation phase, readying the task, iterating done retrieval, and producing an archetypal draft, portion a 2nd exemplary focuses connected reappraisal and refinement, acting arsenic an adept reviewer earlier the last study is produced.”

Here's the basal occupation Critique is designed to fix: Every AI probe instrumentality contiguous works the aforesaid way. You inquire a question, 1 exemplary plans a search, scours sources, writes a report, and hands it backmost to you. That azygous exemplary is doing everything with nary 1 checking its work.

This tin extremity up with immoderate hallucinations slipping in, immoderate errors successful citations, fake oregon inaccurate claims, etc.

Critique breaks that workflow successful two. GPT handles the archetypal phase—it plans the research, pulls sources, and writes an archetypal draft. Then Claude steps successful arsenic a strict editor, reviewing the study for factual accuracy, citation quality, and whether the reply really addressed what was asked. Only aft that reappraisal does the last study scope the user. Microsoft says the roles tin yet tally successful the other absorption too, with Claude drafting and GPT critiquing, though for present GPT goes first.

On the DRACO benchmark—a standardized trial covering 100 analyzable probe tasks crossed 10 domains including medicine, law, and technology—Copilot with Critique scored 57.4. points with Anthropic's Claude Opus 4.6 by itself hitting 42.7. Microsoft's combined strategy beats the adjacent champion effect by astir 14%.

The biggest gains showed up successful breadth of investigation and presumption quality, with factual accuracy besides posting a important improvement.

The 2nd feature, Council, takes a antithetic attack to the aforesaid problem. Instead of having 1 exemplary reappraisal the other's work, Council runs GPT and Claude simultaneously and puts their afloat reports broadside by side. A 3rd "judge" exemplary past reads some and writes a summary explaining wherever the 2 AIs agreed, wherever they diverged, and what unsocial angles each 1 caught that the different missed. Comparing AI probe tools manually has been thing users person had to bash themselves until now.

In Critique, the models fundamentally collaborate with each different portion successful Council the models compete against each other.

Critique is the default acquisition successful Researcher whereas Council requires you to prime "Model Council" from the picker to activate the side-by-side mode. Both features are presently disposable to users enrolled successful Microsoft's Frontier program, the early-access transmission for Copilot's newest capabilities. A Microsoft 365 Copilot licence ($30/user/month) is required, but users besides request to beryllium enrolled successful Frontier to entree them.

OpenAI and Microsoft person a multibillion-dollar partnership, but Microsoft's stake is that nary azygous exemplary stays connected apical for long, and that the existent worth is successful the orchestration furniture that routes tasks to whichever operation works best.