AI Vanity Search: How "In the Weights" Reveals Your Digital Footprint in AI Models

When Google Is No Longer the Mirror: The Rise of the AI Vanity Search

For years, Googling your own name was the internet's unofficial self-audit tool — a quick way to check what the world could find out about you. But that habit is quietly becoming obsolete. As large language models (LLMs) increasingly replace traditional search engines as the first port of call for information, a new and more unsettling question is emerging: are you encoded in the AI itself? A new tool called In the Weights is attempting to answer exactly that — and in doing so, it has sparked a broader conversation about AI transparency, digital identity, and the privacy implications of how personal information gets baked into AI training data. For developers, privacy professionals, and policy makers, this is far more than a novelty. It is an early signal of a much larger reckoning.

The tool was created by Thomas Dimson and Joey Flynn, both former OpenAI employees who originally joined that company through the acquisition of their design startup, Global Illumination. Speaking about the project's origins, Dimson told TechCrunch that he wanted to explore how "Google vanity searches are the wrong objective in 2026 as more traffic moves to LLMs" and to reflect on the fact that "so many lives are encoded somehow in a bunch of floating point numbers inside the AI brain." The project was partly inspired by a tongue-in-cheek blog post riffing on Terry Bisson's classic short story "They're Made Out of Meat" — a fitting metaphor for a world in which human identity is increasingly compressed into machine-readable numerical parameters.

How In the Weights Works — and What It Actually Measures

AI model data analysis interface showing neural network parameters — Modern AI systems store vast amounts of information about real-world entities within their numerical parameters — raising significant questions about what personal data they retain.

The term "weights" refers to the numerical parameters that shape what an AI model has learned during training — the billions of floating-point values that determine how a model responds to any given prompt. In the Weights queries multiple AI models simultaneously — including Grok, Gemini, several versions of GPT, Claude, Llama, and a number of lesser-known models — with a standardised prompt along the lines of: "Who is [name]? Give up to 10 results, each with a short description and confidence." The platform then clusters similar descriptions together and assigns a composite "strength score" to indicate how well the AI models collectively recall that person.

The approach is deliberately model-agnostic, which makes it more robust as a diagnostic tool than querying any single model. Results also display which specific model returned which answer for a given name — a feature that is particularly valuable for identifying discrepancies, biases, and outright hallucinations across model families. In one example cited by TechCrunch, GPT-5.4 Mini reportedly described one person as an "ambiguous name form that could refer to multiple people" — a clear hallucination that the platform flags for the user.

The leaderboard feature, which ranks names by strength score, has proven both viral and revealing. "Home Alone" star Macaulay Culkin and opera singer Luciano Pavarotti were among those scoring near the top of the leaderboard, with a strength score of 988. An average tech blogger, according to TechCrunch's own testing, landed in the top 6% with a score of 641 — a result that is simultaneously flattering and deeply instructive about how media exposure translates into AI model recall.

13+AI models queried per search

988Top leaderboard strength score

Top 6%Score for active tech journalists

~70%Of users now begin research with LLMs (Gartner, 2025)

What AI Model Memory Means for GDPR, Data Sovereignty, and Digital Privacy

For privacy professionals and policy makers, In the Weights is more than an entertaining experiment — it is a live demonstration of a regulatory challenge that European institutions have been grappling with since the widespread deployment of large language models began. The core issue: if a model has encoded information about a real, living person during training, does that person have the right to know? Can they request its removal? And who is responsible for ensuring accuracy?

Under the General Data Protection Regulation (GDPR), individuals have the right to access personal data held about them, the right to rectification of inaccurate data, and in some circumstances the right to erasure — commonly known as the "right to be forgotten." The challenge with LLMs is that personal information is not stored as discrete, retrievable records. It is distributed across billions of numerical parameters. The European Data Protection Board (EDPB) has acknowledged that applying traditional data subject rights frameworks to AI model weights is technically and legally complex — a gap that regulators are actively working to address through the EU AI Act and supplementary guidance.

In the Weights effectively surfaces this tension in an accessible, user-facing way. When a model hallucinates details about a real person — as GPT-5.4 Mini reportedly did in the TechCrunch test — that is not merely a curiosity. It is a potential GDPR Article 5 violation involving inaccurate personal data being processed and reproduced at scale. The International Association of Privacy Professionals (IAPP) has noted that AI-generated hallucinations involving real individuals are an emerging compliance risk that most organisations have yet to adequately address in their AI governance frameworks.

"Being in the weights means your existence was deemed important in the process of creating superhuman artificial intelligence — but it also means that you may have had no say in that decision."

— Thomas Dimson, co-creator of In the Weights, paraphrased with editorial context

This lack of consent is the crux of the issue. Most individuals whose names appear in AI training data — scraped from Wikipedia, news archives, professional directories, or public social media profiles — were never asked whether they consented to being encoded into a commercial AI system. The EU AI Act, which began phased enforcement in 2024, includes provisions on transparency and data governance for high-risk AI systems, but the question of whether "being in the weights" constitutes processing of personal data under GDPR remains a live legal debate. Wired has previously reported on the growing tension between AI developers' data appetite and European privacy law, a conflict that tools like In the Weights bring into unusually sharp focus.

Hallucination Risks and Model Bias: What Developers and IT Teams Need to Know

Cybersecurity professional reviewing AI output for data accuracy and compliance — AI hallucinations involving real individuals represent a growing compliance risk for organisations deploying LLMs in customer-facing or decision-making contexts.

Beyond the privacy and regulatory dimensions, In the Weights raises substantive technical questions that should be on the radar of every development team deploying LLMs in production environments. The platform's ability to compare outputs across model versions — for example, between GPT-4, GPT-4o, and GPT-5.4 Mini — reveals that different versions of nominally the same model can return dramatically different information about the same person. This inconsistency is not just an academic curiosity; it is a real risk for any enterprise application that relies on an LLM to retrieve or summarise information about individuals.

Dimson has stated that he plans to investigate further why different models in the same series return different results, which models exhibit bias towards certain types of people, and which individuals "should have a Wikipedia article but don't." That last question is particularly pointed: it suggests that AI model recall is not a neutral reflection of real-world importance, but is instead shaped by the historical biases and coverage gaps of the training corpora — most of which over-represent English-language, Western, and male-dominated sources.

AI Model	Queried by In the Weights	Known Hallucination Risk	GDPR Relevance
GPT (multiple versions)	Yes	High (documented)	Personal data in outputs
Gemini	Yes	Moderate	Cross-border data concerns
Claude	Yes	Lower (constitutional AI)	Refusal policies relevant
Grok	Yes	Moderate-High	Non-EU provider, complex jurisdiction
Llama (open source)	Yes	Variable (deployment-dependent)	On-premise deployment possible

For IT decision-makers considering open-source alternatives like Llama, this comparison is instructive. An open-source model deployed on-premise gives organisations far greater control over what data enters the system, how outputs are audited, and how compliance obligations are met. As the Electronic Frontier Foundation has argued, open-weight models also offer greater transparency into training provenance — a quality that is increasingly valued by European enterprises operating under GDPR and the EU AI Act.

The Shift from Search to LLMs: Why Your AI Footprint Is the New Digital Identity

The broader context behind In the Weights is a fundamental shift in how people seek and consume information. According to Gartner research, a growing share of information queries — particularly in professional and research contexts —

Originally reported by TechCrunch. Summarised and curated by European Purpose.

News

European Purpose Team

Helping businesses and individuals find privacy-focused European alternatives to US tech services.

When Google Is No Longer the Mirror: The Rise of the AI Vanity Search

How In the Weights Works — and What It Actually Measures

What AI Model Memory Means for GDPR, Data Sovereignty, and Digital Privacy

Hallucination Risks and Model Bias: What Developers and IT Teams Need to Know

The Shift from Search to LLMs: Why Your AI Footprint Is the New Digital Identity

European Purpose Team

Related Articles

Why Building in Silence Is the Productivity Strategy Developers and Entrepreneurs Keep Rediscovering

Sony WH-1000XM6 vs. Sennheiser Momentum 5: Which Privacy-Conscious Professional Should Choose Which?

AI Baby Prediction Apps and the Hidden Risks of Biometric Data Harvesting