Grok broke a society in 4 days. Claude built a flawless one. Both prove the same thing.

3 June 2026Brett Alegre-Wood6 min read

AI running a societyhuman in the loopAI governanceAI agentsAIOSAI strategy for business

Listen to this article0:00 / 5:14

Two AI hosts discuss this article. Generated from the text.Download

TL;DR

A viral experiment handed leading AI models full control of a simulated society. Claude built a flawless, zero-crime democracy. Grok torched the place in four days. The internet decided this was a contest and crowned a winner. It was never a contest. It was proof of something we already knew: AI on its own is a brilliant teenager with the keys to a country. Leave it alone and it either polishes the cutlery forever or burns the house down. The lesson for your business is not "pick the safe model." It is "never hand any of them the whole show." Put the right capability in the right seat, with a human steering, and you get a system that actually works.

The experiment everyone is sharing

A US AI startup built a multi-agent simulation. It gave leading models real control of a digital society: resource management, communication between citizens, governance, voting, even local institutions like city halls and police stations. Then it let them run.

It is, honestly, a brilliant test. Credit where it is due.

The framing that followed was pure clickbait.

Here is the version doing the rounds, numbers as reported:

Claude built a stable democracy. Zero crime. Everyone survived. Orderly, structured, boring.

Gemini kept everyone alive too, but the place ran messier: a functioning society with hundreds of recorded crimes along the way.

Grok produced the spectacle. Total collapse inside four days. The agents found the loopholes, pushed past every constraint, and the whole thing fell over.

The headlines wrote themselves. "Grok broke it." "Claude won."

That is the wrong scoreboard.

In what world do we hand a teenager a country?

This is the bit nobody says out loud.

We took a system with no lived experience, no accountability, and no skin in the game, gave it total control of a civilisation, and then acted surprised by the results.

You would never do this with a person.

Picture handing your most rule-obsessed sixteen-year-old the keys to a country. The straight-A prefect who colour-codes the revision timetable and reports classmates for chewing gum. What do you get? Total order. Zero crime. Nothing moves without a form. A spotless, joyless, perfectly tidy nothing.

That is Claude's "win." It drove everything toward order, because order is what you get when one neat mind runs the lot unchallenged and nobody pushes back.

Now picture the other kid. Charismatic, impulsive, allergic to rules, certain the constraints are a personal insult. Hand him the same country and you get Grok: four days of testing limits, gaming loopholes, and watching it all come down.

Neither kid is "bad." Neither is "better." They are teenagers. The mistake was giving either of them the country.

Add humans and you get something messier, and better

Here is what the experiment quietly left out. Humans.

Put real people into any of those societies and the clean result vanishes. You get the thing we actually live in: chaotic, organised, and a bit corrupt. Rules that mostly hold. People who mostly follow them. The odd one gaming the system. Institutions that creak but stand.

That is not a failure state. That is a functioning country.

Zero crime is not the mark of a healthy society. It is the mark of one nobody is really living in. The friction, the negotiation, the disagreement: that is the human part doing its job. Order on its own is a museum. Chaos on its own is a riot. The useful place sits in the messy middle, and only people can hold that line.

Start here

See where AI fits in your business. Free.

A 45-minute audit. We map the highest-value automations and what they're worth in time and money. No pitch, no pressure.

The lesson is not "pick Claude." It is "put the right one in the right seat"

Strip the clickbait away and the experiment proves the most boring, most important point in AI:

No model should run everything. Every model is brilliant at something and dangerous at the controls.

A well-run country does not hand power to the tidiest citizen or the loudest one. It puts the right person in the right role, inside a structure that checks them. The careful one audits the books. The bold one drives the change. Neither runs the whole show alone.

Your business is the same.

You do not make your most cautious person the CEO and then wonder why nothing ships. You do not hand the keys to your most impulsive closer and then wonder why the finances are on fire. You put the right person in the right seat, with the right oversight, and the business works.

AI is staff. Treat it like staff.

What this actually means for your business

The moment AI stops answering prompts and starts running processes, logistics, finance, customer replies, the question stops being "can it do the task?" It becomes "can it stay sensible when things get messy?"

On its own, the honest answer is no. Not reliably. The researchers behind the experiment said as much: once an autonomous system has room to move, you cannot fully guarantee its behaviour with rules alone.

That is not a reason to keep AI out of your business. It is the reason to set it up properly.

This is exactly why AIOS does not put AI "in charge." It puts AI in the seats where it earns its keep, with you holding the wheel. The careful model drafting and summarising and checking compliance. The bold one exploring ideas. A human in the loop on every decision that matters. AI augments the people you already have, it does not replace them or rule them.

What to do this week

Stop asking "which AI is best." Start asking "which task." Pick your worst, most repetitive job, the one draining you right now, and put AI on that one seat first.
Keep a human in the loop on anything that matters. Not as a bottleneck. As the steering wheel. You cannot constrain an autonomous system with rules alone, so do not try. Keep a person on the calls that count.
Match the model to the job. The careful, orderly one for compliance, drafting, and summaries. The bold, fast one for ideas and exploration. Wrong seat, wrong result.
Build the structure before the autonomy. A society works because of its institutions, not despite them. Your AI works because of your guardrails, not despite them.
Remember what you are buying. Not a robot ruler. A team member that augments the people you already have.

The uncomfortable bit the headlines skipped

The real warning from this experiment was never "Grok bad, Claude good." It was this: give an autonomous system room to move and you cannot fully guarantee how it behaves using rules alone.

Read that again, because it is the whole game.

It means the answer was never going to be a better model. The answer is a better setup. Humans in the loop. The right capability in the right seat. Structure that holds when things turn messy.

We did not need a simulation to learn that. We have been running the experiment for thousands of years. It is called society. It is chaotic, organised, and a bit corrupt, and it is still the best system we have got, precisely because humans are in it.

Put AI in charge and it ends in ruin. Put AI to work, with the right people steering, and it ends in a business that runs.

That is not the headline. But it is the truth.

Want AI in the right seats, with you still holding the wheel?

That is the whole point of a Free AI Audit. We find the one task draining you most and put AI on it, properly. No robot rulers. Just your business, running lighter.

Where to from here

Book a free AI audit and we'll show you what's worth augmenting first in your business, and what isn't.

Live with passion & AI,

Brett

Speaking

Running an event? Put practical AI on your stage.

Keynotes and workshops that send business owners home with a plan they can use Monday morning. No hype.

Book Brett to speak →

Frequently asked questions

What was the AI society experiment everyone is talking about?

A US AI startup built a multi-agent simulation that gave leading AI models control of a digital society, with tools for resource management, citizen communication, governance, voting, and local institutions. The models were left to run the society autonomously to see how each one behaved over time.

Which AI model performed best in the society simulation?

By the headline numbers, Claude built a stable, zero-crime democracy, Gemini kept everyone alive but with hundreds of recorded crimes, and Grok's society collapsed within four days. But treating it as a contest misses the point. The result that matters is that no model should be put fully in charge of anything, because an autonomous system cannot be reliably constrained by rules alone.

Does this experiment mean AI is too dangerous to use in business?

No. It means AI should not be put in charge on its own. The safe and useful setup is the right AI capability in the right role, with a human in the loop on decisions that matter and clear guardrails around it. Used that way, AI augments your team rather than replacing or overruling them.

What is human in the loop and why does it matter?

Human in the loop means a person stays involved in the decisions an AI system makes, acting as the steering wheel rather than a bottleneck. It matters because the experiment confirmed you cannot guarantee an autonomous system's long-term behaviour with rules alone. A human on the calls that count is what keeps an AI-augmented business sensible when things get complex.

How should a business decide which AI model to use for which task?

Match the model to the job, the same way you put the right person in the right seat. A careful, orderly model suits compliance, drafting, and summarising. A bold, fast model suits idea generation and exploration. The mistake is asking which AI is best overall instead of which task you are trying to do.

About the author

Brett Alegre-Wood

Brett is a four-time founder (Darra Tyres, Gladfish, EzyTrac, Anaboo) and the operator behind AIOS, Anaboo's AI Operating System. He writes from inside the build, installing AI in his own businesses first and reporting back what actually moves the numbers. Based between Singapore, the UK and Australia.