If you’ve ever asked an AI chatbot to write you a short story and wound up with a brooding lighthouse keeper named Elias, you’re not alone — and it’s not a coincidence. The phenomenon of AI chatbot characters recycling the same names, occupations, and settings has gone from a quirky internet observation to a legitimate research finding, with implications that go well beyond bedtime stories.
- AI chatbot characters like Elias the lighthouse keeper appeared in two-thirds of roughly 20,000 stories researchers generated across major models.
- Cornell researchers found AI chatbot characters repeat because of alignment training datasets, not original pre-training data.
- Just 11 words — including Lighthouse, Keeper, and the name Elias — appeared in 88% of all AI-generated stories tested.
- The fictional Elias Thorne name is now spreading into real Amazon music listings and dubious health handbooks online.
Table of Contents
The Elias Thorne Problem: What the Data Actually Shows
It started with software engineer Daniel May noticing something odd: no matter which AI model he prompted for a story, a character named Elias — typically a lighthouse keeper — kept showing up. Anecdotes are one thing, but researchers at Cornell University decided to put real numbers to it.
Their preprint paper, first reported by 404 Media, tested several of the most widely used models — OpenAI’s GPT-5.4 Mini, Anthropic’s Claude Haiku 4.5, and Google’s Gemini 3.1 Flash-Lite — across five different prompts and collected roughly 20,000 stories in total. What they found was striking even by the low bar we’ve come to expect from AI creativity: just 11 words appeared in 88% of all stories generated. Those words were Lighthouse, Keeper, Baker, Mayor, Clockmaker, Fisherman, Librarian, Conductor, and three names — Mara, Elias, and Elara.
The most dominant combination? Elias the lighthouse keeper, appearing in approximately two-thirds of all outputs. That’s not a quirk — that’s a near-universal default. When every major AI model independently reaches for the same AI chatbot characters in response to an open-ended creative prompt, something structural is going on.
Why AI Chatbot Characters Get Stuck in a Loop
The Cornell team’s first hypothesis was the obvious one: maybe ‘Elias the lighthouse keeper’ just appears an unusual number of times in pre-training data, the vast web-scraped text corpora that LLMs learn from. If models absorbed millions of 19th-century maritime novels, sure, a lighthouse keeper protagonist might start to feel natural. But the researchers couldn’t find any evidence of that. The phrase doesn’t appear with unusual frequency in known training corpora or published literature.
So they looked elsewhere — specifically at the fine-tuning and alignment training stage, where models are shaped after initial pre-training to behave safely, helpfully, and in line with the deploying company’s guidelines. This is where it gets interesting.
The researchers point to shared datasets used across the industry during this phase. The prime example they cite is WildChat, an open-source dataset containing millions of real conversations between users and a GPT-3.5-powered chatbot. WildChat was originally built to help academics study how people actually interact with AI systems — a genuinely useful resource. But because it’s open-source and well-structured, it has since been adopted by a wide range of labs for training their own models.
The theory goes like this: during alignment training, AI chatbot characters are specifically steered away from copyrighted personas or adult content. To do that, trainers needed ‘safe’ alternatives — original, inoffensive fictional personas that could serve as stand-ins. If WildChat (or similar datasets) contained a disproportionate number of interactions where ‘Elias the lighthouse keeper’ was used as exactly that kind of safe placeholder, those outputs would have been rated positively during reinforcement learning from human feedback (RLHF). Over thousands of training iterations, the model learns that this character is a reliable, approved choice. Do that across multiple labs all drawing from the same pool of data, and you’ve baked the same creative default into the entire industry simultaneously.
A Fictional Character Escaping into the Real World
If this were just an academic curiosity — AI models sharing a weird tic — it would be easy to dismiss. But the AI chatbot characters problem has a genuinely unsettling downstream consequence: Elias Thorne is leaking out of chatbot outputs and into real products.
404 Media’s investigation found ‘Elias Thorne’ credited as the protagonist in self-published fantasy novels, listed as the ‘artist’ on ambient music tracks available on Amazon, and — most concerning — named as the author of a handbook purporting to offer information on alternative cancer treatments. That last one isn’t a quirky side effect. A fake AI-generated author lending apparent credibility to a medical misinformation document is a genuine public health risk, however small in isolation.
This is the logical endpoint of a system that generates default AI chatbot characters at scale: the same fictional persona gets copy-pasted across thousands of low-effort AI-generated products, muddying search results, cluttering marketplaces, and in worst cases, attaching a human-sounding name to dangerous content. Amazon, for its part, has faced growing criticism over its inability to screen AI-generated books that flood its Kindle and print-on-demand platforms — and the Elias Thorne music credits suggest the problem extends to its media catalogue too.
The Creativity Ceiling in Current AI Systems
The Elias Thorne story fits into a wider pattern that researchers have been quietly documenting. A study published last year found that image generation models — despite being capable of rendering almost any visual concept — default to just 12 specific visual motifs when given open-ended prompts. The same structural poverty of imagination shows up in text generation, music generation, and now in the AI chatbot characters these systems produce.
The honest framing here isn’t that AI is bad at creativity — it’s that current AI systems aren’t doing creativity at all in any meaningful sense. They’re doing sophisticated pattern completion. When a prompt is open-ended enough to allow any answer, the model doesn’t ‘imagine’ — it converges on whatever output its training has most strongly rewarded. That’s fundamentally different from how a human writer, even a mediocre one, approaches an open page.
The deeper irony is that alignment training — the process specifically designed to make AI outputs safer and more appropriate — may be the very mechanism compressing AI chatbot characters into a handful of approved archetypes. Safety guardrails and creative diversity are, at least under current training approaches, pulling in opposite directions. Every time a lab uses RLHF to nudge a model away from problematic outputs, it’s also potentially narrowing the space of ‘safe’ creative choices the model feels confident making.
What This Means for the Industry
The Cornell findings should prompt some uncomfortable conversations in AI labs. If multiple frontier models — from OpenAI, Google, and Anthropic, no less — are all producing near-identical AI chatbot characters, it raises real questions about how much genuine diversity exists in AI-generated content at all. We tend to think of three separate companies training three separate models as providing meaningful variety. This research suggests that when shared fine-tuning datasets enter the picture, the outputs can converge in ways that undermine that assumption entirely.
There’s also a broader question about disclosure and traceability. If ‘Elias Thorne’ is effectively a watermark — an inadvertent signal that a piece of content was AI-generated — could it be used as one deliberately? Researchers have explored the idea of embedding statistical signatures in model outputs for exactly this purpose. The fact that a detectable signature emerged accidentally, through training dynamics no one fully intended, hints at just how much latent structure exists in these models that we haven’t fully mapped yet.
For now, if you’re building a product with AI-generated content and your AI chatbot characters keep turning out to be lighthouse keepers, you might want to take that as a prompt to look more carefully at what else your AI is defaulting to — and whether any of it is already showing up somewhere it shouldn’t be.
Source: Gizmodo
Frequently Asked Questions
Why do AI chatbot characters like Elias Thorne keep appearing in generated stories?
Cornell researchers believe alignment training — designed to steer models away from copyrighted characters and adult content — inadvertently gave ‘safe’ fictional characters like Elias the lighthouse keeper unusual prominence. Shared training datasets like WildChat likely amplified the repetition across multiple AI models.
Which AI models were tested in the Elias Thorne research?
The Cornell study tested OpenAI’s GPT-5.4 Mini, Anthropic’s Claude Haiku 4.5, and Google’s Gemini 3.1 Flash-Lite, generating around 20,000 stories across five different prompts to identify patterns in the outputs.
What is the WildChat dataset and why does it matter here?
WildChat is an open-source dataset of millions of conversations between users and a GPT-3.5-powered chatbot, originally created to study human-AI communication. Researchers believe it has since been widely used in AI training, potentially spreading the same character defaults across many models.
Is AI genuinely creative when it comes to writing stories?
The evidence increasingly suggests not. Beyond the Elias Thorne pattern, a separate study found that image generation models default to just 12 recurring visual motifs regardless of how unusual the prompt is — pointing to a systemic creativity ceiling in current AI systems.



