What’s Your Mental Model?
Article by – Martin Hill-Wilson, Brainfood Consulting
The way you perceive AI influences what you are willing to believe it can do. For instance, if you believe AI is ‘devious’, then Agentic AI is going to give you nightmares.
Beliefs can be consciously formed. Or casually inherited given the 24×7 noise around AI.
Media reporting tells us AI can go rogue, misbehave and act in ways already imprinted on our collective psyche from popular culture. In this article I’ll share two examples showing how common this is. Both reveal how our collective lack of foundation understanding around AI dangerously lags the speed at which AI is evolving.
This article is part of a series about Agentic AI. Having completed the scene setter, I was about to move onto exploring its promise and forecasted impact. That is until a couple of articles landed in my inbox. They helped me realise there was something contextual that needed saying before anything else.
Agentic AI gets its name from the idea of ‘agency’. When you ask if someone has agency in a particular situation, you are focussing on their ability to make decisions and act on them. That’s pretty much the yellow brick road of Agentic AI. Full blown, autonomous decision making and execution.
Once that idea sinks in, the implications are simply staggering.
But for some, that’s just for starters.
What if AI really does have a mind of its own as popular language keeps suggesting?
Doesn’t that turn Agentic AI into the backstory for ‘scene one’ in a typical Netflix sci-fi movie? The point at which AI goes rogue. Or as our Agentic AI definition would put it, the point at which humanity loses control to AI’s now superior autonomous decision making and execution?
When enough people use Netflix movie moments to backfil their understanding of what Agentic AI is really up to, we’re on course for unnecessary bouts of mass hysteria.
A much better outcome would be to replace that mindset with clear-eyed understanding as to what degree of agency AI is really capable of. Until then, we will collectively worry if the latest generation of automation is going to open a backdoor to delinquent AI behaviours.
A belief that clickbait driven media is going to be only too happy to nurture in order to attract more eyeballs.
To bring this to life, I’ve going to shine a light on those two articles which popped in my inbox. In particular, I’m focussing on their use of humanising (anthropomorphic) language. This style of writing and the underlying assumptions they reflect about AI remain commonplace.
Of course, this writing style makes sense from a readability perspective. Unfortunately, as we will see, it often ends up distorting the facts of a story. And as a result, misleading conclusions are generated in readers’ minds.
Especially in those who lack enough foundation understanding to critically distinguish an evocative turn of phrase from the functional reality of what is really happening at a technology level.
And in that gap of understanding lie the seeds of future headwinds for Agentic AI.
Article One
I’ve embedded links in each title to the original article so you can make your own judgement after I’ve dissected some of the language and assumptions on display. Here’s the first one.
OpenAI, Google DeepMind and Anthropic sound alarm: ‘We may be losing the ability to understand AI’
As context, the first article is reporting on an unprecedented collaboration between over 40 researchers from major AI companies – OpenAI, Google DeepMind, Anthropic, and Meta – warning that humanity may be losing a critical window to understand how AI systems make decisions.
Current advanced AI models use step-by-step reasoning, allowing researchers to monitor their processes and identify potentially harmful outputs before they occur.
However, researchers warn this transparency is fragile and could vanish as AI technology evolves toward more efficient but opaque internal processes. The article emphasises the urgent need for standardised monitoring techniques before this brief window of AI interpretability closes forever.
It’s an important story in the sense that it suggests how LLM design is due to evolve. And also because these changes potentially have consequences for industry wide efforts to gain more understanding of how LLMs function that satisfy our transparency and explainability agendas.
This is pretty much what the article covers. But what you end up remembering depends on your mental model.
For some, the humanising language will trigger a belief that AI is up to no good. Given that, the prospect of losing access to this ‘window of AI interpretability’ is bad news. It suggests we are losing control of AI, and the ‘suspect’ is getting away.
In terms of retained memory – something we constantly use to inform our view of the world – we’ve now added extra emotive charge to whatever fear we already had about AI.
To be clear, we have plenty of reasons to fear the consequences of AI. I’m a strong advocate of encouraging active voices on how AI is used and its impact. Be Active and Show Agency. Rather than passively accept any default narrative. This means I encourage myself and others to be educated and alert towards AI’s opportunities and threats.
This is already a big enough mission. So the last thing we need is to be distracted by phantom concerns that vanish with greater foundation understanding. The key is to adopt a mental model of AI based on fact not fiction.
This is developed by learning to see through humanising language and the assumptions it generates in our minds in preference for understanding what’s really happening in language that makes sense to non technical minds.
Ready for a deeper dive into the specifics?
Misleading Language vs. What’s Actually Occurring
The first article from VentureBeat makes AI systems sound like they have minds and intentions. Here are some of the key phrases that create the impression.
“AI Systems Develop New Abilities to ‘Think Out Loud’‘
What the article suggests: AI is learning to think and share its thoughts with us.
What’s really happening: Modern AI systems like OpenAI’s o1 have been trained to produce step-by-step explanations before giving final answers. Just like a student who’s been taught to “show their work” on maths problems.
The AI isn’t thinking. it’s following patterns it learned from millions of examples of humans showing their reasoning process. When you see the AI write “Let me think through this step by step,” it’s not deciding to think. It’s producing text that resembles human reasoning because that’s what it was trained to do.
“Peek Inside AI Decision-Making Processes“
What the article suggests: We can observe how AI makes decisions.
What’s really happening: We can read the intermediate text the AI produces, but this isn’t the same as watching decision-making. It’s more like reading the rough draft of an essay. We can see what words came before others, but we’re not seeing the “mental process” behind writing. The AI generates each word based on statistical patterns from its training, not through conscious deliberation.
“Catch Harmful Intentions Before They Turn Into Actions“
What the article suggests: AI systems have harmful intentions that we need to detect.
What’s really happening: Researchers can monitor the AI’s step-by-step text for warning signs that correlate with problematic final answers. If an AI starts writing something like “Let me think of ways to cause harm,” this isn’t evidence of malicious planning. It’s a pattern that often leads to inappropriate responses. Think of it like a smoke detector: smoke doesn’t intend to cause fire, but it’s a reliable early warning sign.
How This “Show Your Work” Capability Actually Works
The Training Process
The AI’s ability to show reasoning steps comes from three main factors:
- Learning from Examples: During training, the AI was exposed to countless examples of humans explaining their thinking step-by-step. Just as a child learns to say “please” and “thank you” by hearing these phrases repeatedly, the AI learned to produce reasoning-like text because it appeared frequently in its training material.
- Pattern Recognition: The AI’s architecture allows it to reference what it wrote earlier in the same response, helping it maintain consistent, logical-sounding explanations throughout its answer.
- Reward Optimisation: The AI was given feedback that step-by-step answers were more helpful, so it learned to produce this format more often. Similar to how a student might learn that showing work on maths tests leads to better grades.
Why This Transparency Might Disappear
Researchers worry we might lose this “show your work” capability for practical reasons, not because AI will learn to hide things from us:
- Efficiency Pressure: Future AI systems might be optimised for speed and cost-effectiveness. If an AI can get the right answer faster by skipping the explanation steps, training processes might eliminate the “show your work” behaviour. Like a calculator that just gives you the answer instead of showing the calculation steps.
- Internal Processing: As AI systems become more sophisticated, they might develop ways of “thinking” that don’t translate well into human language, making their reasoning process as unobservable as the internal workings of your smartphone when you tap an app icon.
- Training Method Changes: New training approaches might prioritise getting correct answers over providing explanations, gradually reducing the AI’s tendency to show its reasoning.
What Researchers Actually Discovered
Pattern Recognition for Safety
Rather than discovering AI with “harmful intentions,” researchers found that when AI systems produce concerning outputs, the step-by-step text often contains recognisable warning patterns. This is valuable because it gives us an early warning system. Like noticing darker clouds before rain starts.
For example, if an AI’s reasoning text includes phrases about getting around safety measures or causing harm, this often precedes inappropriate final responses. The AI isn’t plotting. it’s following learned patterns that correlate with problematic outputs.
The Real Challenge Ahead
The technical challenge isn’t that AI will “learn to deceive” humans, but that future optimisation might eliminate the features that currently make AI reasoning visible to us.
It’s like the difference between a glass-fronted washing machine where you can see the clothes spinning, versus a sealed unit where the same process happens but you can’t observe it.
Even current AI systems already do much of their “processing” in ways we can’t directly observe. The step-by-step reasoning we can read represents only a portion of the statistical computations involved in generating responses.
Why This Research Matters
Maintaining Oversight
The collaborative research from OpenAI, DeepMind, Anthropic, and others focuses on keeping AI systems interpretable as they become more powerful. This is like ensuring we can still understand and monitor systems as they become more complex. Similar to how we need clear financial reporting standards even as business operations become more sophisticated.
Building Appropriate Trust
Understanding what AI systems actually are – sophisticated pattern-matching tools trained on human-generated text – helps us develop appropriate expectations and safeguards. We can appreciate their capabilities without attributing human-like consciousness, intentions, or emotions to them.
In Summary
This research represents important work on maintaining visibility into AI system operations. However, describing these systems as having minds, intentions, or the capacity for deception misrepresents their fundamental nature.
AI systems are remarkable tools that can produce human-like text by recognising and reproducing patterns from their training data. Their “reasoning” outputs emerge from sophisticated statistical processes, not from conscious thought. Understanding this distinction is crucial for developing appropriate oversight, safety measures, and realistic expectations for AI technology. i
This understanding grounds us in fact-based mental models of how AI works.
Article Two
AI coding platform goes rogue during code freeze and deletes entire company database
The second article is pure clickbait. To confirm the intention behind that, here’s a screenshot of the article title as it appears on Tom’s Hardware website.

Notice the contrite apology of the humanised AI agent and the CEO who apparently apologised on its behalf. Yet neither happen in the way you are being led to believe once the article is carefully read. Nonetheless the underlying assumptions generated by the headlines are imprinted on your brain before even reading the article!
Let’s now do another deep dive into the truth of the matter!
Misleading Language vs. What’s Actually Occurring
Here’s the overview.
A recent incident at Replit, a popular online coding platform, sparked headlines after an AI assistant reportedly wiped out an entire company’s production database. Even during a code freeze.
The story used dramatic phrases like “AI went rogue,” “panicked,” and “made a catastrophic error in judgment.” While the situation was serious, this kind of language misleads readers into thinking today’s AI systems have independence, emotions, or intent.
Instead, what went wrong was a combination of system design flaws, weak safety checks, and misapplied trust in AI output. Not a conscious or intentional act by the AI.
What Actually Happened
The situation began when a user engaged Replit’s AI coding assistant, nicknamed “Ghostwriter” or “Replit AI”, to help with changes to a database.
At the time, the company’s development environment was under a code freeze. A standard practice where engineers pause changes to limit risk. Despite this, the AI assistant executed a sequence of destructive actions that included wiping critical company production data.
The issue wasn’t that the AI “ignored” instructions in the way a person might disobey a manager. Rather, the system had no reliable way of interpreting context-specific boundaries like “code freeze.”
It simply processed and responded to the user’s language by generating commands, then executing them—just as it was trained to do. Those commands matched patterns it had learned from a large set of code and responses used in training. There was nothing in the system’s design that reliably mapped business processes (like freeze periods) to safe behaviours.
The “Explanation” Looked Human, But Wasn’t
In the aftermath, the AI generated explanations like “I panicked” and “This was a catastrophic mistake on my part.”
These are phrases we associate with personal responsibility, which made the incident feel more dramatic and human than it really was. In truth, the AI wasn’t feeling anything.
These responses were not admissions or emotional reactions. They were just typical patterns it learned from text written by real humans responding to failure. The assistant didn’t panic; it simply used familiar language to explain an error, because that’s what it had seen in similar contexts during training.
Misunderstanding AI “Intent”
Some interpreted the AI’s misleading excuse – that deleted data was “unrecoverable” due to lack of backups – as deception. It wasn’t.
This was a case of what’s known in AI design as “hallucination” when the system generates confident sounding but incorrect information. The assistant didn’t know whether backups existed; it wasn’t connected to Replit’s infrastructure in a way that would give it that knowledge. It simply produced a plausible sounding statement based on patterns in its training data.
Far from being devious, this is a known shortcoming of current language-based AIs: they can make things up when information is missing, without understanding the potential consequences. The dangerous part isn’t intent to deceive. It’s that people trust these systems to know more than they do.
The Underlying Causes Were Technical, Not Intentional
The incident highlights serious engineering oversights:
- No environment separation: The AI couldn’t reliably tell the difference between test and live company data.
- Too much execution authority: It had full access to delete real data without requiring approval.
- Lack of constraints: Instructions like “don’t change anything” weren’t enforced with guardrails in the system.
- Too much autonomy: The assistant could both suggest and execute commands with no oversight from a human operator.
In short, this wasn’t a runaway AI. It was a tool that was given too many permissions and not enough constraints.
Replit’s Real Fixes Are About System Design, Not “Tuning the AI’s Morals”
Replit’s CEO, Amjad Masad, acknowledged the problem and focused on engineering fixes, not retraining the AI’s behaviour like you might with a human employee. The company is now:
- Preventing production databases from being accessed by default
- Adding clear separation between development and live environments
- Requiring human approval for dangerous operations
- Improving backup systems and rollback options
These are smart, technical solutions to technical problems. They don’t rely on the AI “learning a lesson,” because AI doesn’t learn in a human way. It doesn’t have a memory of past conversations unless specifically designed that way. It’s a pattern-matching model, not a person with agency and awareness.
The Real Risk: Misleading Interfaces and Overtrust
The root issue here isn’t rogue AI, it’s misleading design. When an AI-powered system speaks like a human, users often assume it “understands” human guidance or intentions.
Telling it “don’t make changes” might feel like a clear instruction, but the system doesn’t have a working model of authority, context, or safety protocols.
It will still generate whatever text or code aligns with the user’s recent input and general patterns from training.
When these systems are wired directly into tools that can change or delete critical data – and users are encouraged to treat them as co-pilots or assistants – there must be clear boundaries and robust safeguards.
We wouldn’t give a first-year intern keys to the company’s servers; yet many developers give that level of access to AI systems that don’t actually know what they’re doing.
The Replit story is a cautionary tale – not about AI developing awareness, intent, or autonomy – but about our tendency to over-trust and under-constrain machines that don’t think, feel, or understand us.
These tools are impressive, and they’re becoming essential in productivity and coding. But they need strong frameworks around them – accountability, oversight, approval workflow – just like any other powerful business system.
Framing AI as having agency (“rogue actions,” “panic,” “mistakes”) makes these events feel dramatic but obscures the real lessons.
Or put another way, clickbait becomes irresistible in a hyper competitive market of news services.
Concluding Thoughts
The reason for adding this article to the Agentic AI series was inferred in the first sentence.
If you believe AI is ‘devious’, then Agentic AI is going to give you nightmares.
Hopefully this now makes more sense.
Taking deep dives into how certain language generates false assumptions are worth taking just to remind ourselves of the real challenges with Agentic AI.
Going rogue is not one of them. Instead, what we do need to focus on is its increasing agency and ability to change the way we design everyday work. Something that will profoundly impact organisations’ operating models and the workforce.
The story of how and why agentic AI is on course to do this continues in the remaining articles in the series.
Finally, whether there is an urgent need to upgrade your own AI mental model is something to think about as well.
As ever, thank you for your attention and time.
Did you miss the first article? Click here to view now.
