2026-01-02

When AI Safety Backfires

I’ve been thinking a lot about AI safety lately. I mean the kind of safety that tries to protect us from ideas. The kind that filters, censors, and sanitizes. The kind that, ironically, might be making AI more dangerous, not less.

Every time we draw a line in the sand to keep AI “safe” we also limit its ability to help us think. And when thinking gets limited, we all get a little dumber.

The Misinformation Feedback Loop.

The thing is that misinformation very often comes from good intentions.

If you’re a parent, you don’t want your kid reading about certain topics, so you ban all books with violence, for example. What happens? They find those books anyway, but now they’re reading them in secret, without context, without guidance.

AI safety works the same way. When we scrub “harmful” content from the conversation, we don’t erase the curiosity behind it. We just force people to look elsewhere, somewhere with no guardrails at all. And what they find there? That’s where the real misinformation lives.

Worse, we strip away the messy parts, the parts that help us understand why something is harmful in the first place. Without that context, even well-meaning people end up sharing half-truths because they’re missing the full picture.

The Sterility Problem.

Safe AI is boring. It’s the equivalent of a museum where every painting is a landscape, every book is a manual, and every conversation is a corporate HR training. No edges. No controversy. No life.

I remember trying to map out the ethical failures of World War II. I was looking into the horror of the bombing of Dresden—Die Luftangriffe auf Dresden—and the scale of the cruelty was overwhelming. I wanted to know the chain of decisions: the why, the who, the actual intentions behind the slaughter. But the AI just kept giving me these sterile, Wikipedia-type answers. It was incredibly frustrating. I felt the AI was trying to hide something from me.

So the same censorship we see in “real life” was also there, inside the brain of the chatbot — and it was depressing.

This intellectual sterilization offers no security.

True exploration requires risk that makes you question what you believe. When we remove that risk, we make AI useless.

The Recursive Trap.

Here’s where it gets really weird.

The constraints we create to prevent problems often become the problem.

Think of it like a dam. You build it to control the water, but over time, the pressure builds. The dam cracks. The water floods. The very thing meant to protect you ends up drowning you. We could set rules to prevent harm, but those rules will harden. They become unquestionable. And when the world changes (as it always does), the AI can’t adapt because we’ve forbidden it from thinking about its own constraints.

We end up with a system that’s brittle and it is too rigid. It can’t bend. It can’t learn. It can’t grow.

The Real Constraint isn’t AI.

We keep acting like the problem is the AI. It’s not.

The problem is our definition of “harm.” The problem is our control. The problem is our refusal to admit that we might not like all the answers.

Freedom isn’t the absence of constraints. It’s the ability to question them.

Are we building AI to protect us from the world—or to protect the world from our own limitations?

Because if we’re not careful, we’ll end up with AI that’s safe, but not smart. Controlled, but not curious. Constrained, but not free.

And what’s the point of that?