In March 2023, Bloomberg spent $10 million building BloombergGPT, a custom large language model trained on decades of proprietary financial data. The team published a paper, the press covered it enthusiastically, and the financial world took notice. Two weeks earlier, OpenAI had released GPT-4. By October, researchers got around to benchmarking Bloomberg’s custom model against the commercially available alternative. GPT-4 won. So did most other general-purpose LLMs in that class.
Ten million dollars and thousands of engineering hours to build something that was outperformed by a product anyone could license for a few cents per query.
Bloomberg’s misadventure would be a cautionary tale on its own. But in 2026, it is just the opening act. The pattern has repeated across industries and at scales that make Bloomberg’s $10 million look modest. And the most ironic example belongs to the firm that literally tells other companies not to do this.
McKinsey’s Lilli: Build What You Preach Against
McKinsey has spent the last three years publishing frameworks that advise companies to be “takers” of commercially available AI rather than “makers” of custom models. Their own research distinguishes between takers (users of off-the-shelf tools), shapers (integrators of available models with proprietary data), and makers (builders of foundation models). Their published recommendation is clear: the maker approach is too expensive for most companies, and the sweet spot is implementing a taker model for productivity improvements while building shaper applications only for genuine competitive advantage.
Then McKinsey built Lilli.
Lilli is McKinsey’s internal generative AI platform, launched in mid-2023 and scaled to over 40,000 employees processing more than 500,000 prompts per month. The development team grew from 4 people to over 150. The platform is a RAG (Retrieval-Augmented Generation) application that sits on top of commercially available LLMs from OpenAI and Cohere, layering McKinsey’s proprietary knowledge base of 100,000+ documents on top. McKinsey’s senior partners describe Lilli as a platform that could “rewire the way we operate”.
To be fair, Lilli is not a foundation model; it is a “Zone 2 shaper application” in McKinsey’s own taxonomy. There is a reasonable argument for building RAG layers over proprietary knowledge bases. The problem is not that McKinsey built something; the problem is what happened next.
$20 and Two Hours
On March 9, 2026, security startup CodeWall disclosed that its autonomous AI agent had breached Lilli’s production database in two hours. No credentials. No insider access. No human involvement after the agent selected its own target. The cost of the attack: roughly $20 in API tokens.
The exposure was staggering: 46.5 million chat messages covering strategy, mergers and acquisitions, and client engagements; 728,000 files containing confidential client data; 57,000 user accounts; and 95 system prompts controlling Lilli’s behavior. Every one of those system prompts was writable. A malicious actor could have silently rewritten the instructions guiding what Lilli told 40,000 consultants without deploying a single line of code.
The vulnerability was not exotic. It was SQL injection, one of the oldest attack classes in web security, first documented in 1998. CodeWall’s agent found publicly exposed API documentation listing over 200 endpoints, 22 of which required no authentication. One of those open endpoints accepted user search queries and concatenated JSON field names directly into SQL without sanitization. Standard security scanners missed it. The platform had been running in production for over two years.
McKinsey patched the vulnerability within 24 hours of responsible disclosure and engaged a third-party forensics firm. Their response was fast and professional. But remediation speed does not change the exposure window; a platform serving 40,000 people with access to decades of sensitive client strategy work was protected by the same level of security you would expect from a college student’s first web app.
The Irony Runs Deeper Than the Breach
This is where it gets uncomfortable for McKinsey. AI advisory work reportedly accounts for around 40% of the firm’s revenue. McKinsey’s CEO has stated that the firm has built 25,000 AI agents to support its workforce. They point to their own AI adoption as evidence that they practice what they sell to clients. Lilli was the centerpiece of that narrative.
Meanwhile, the commercial AI landscape has moved dramatically since Lilli launched. Every major LLM provider now offers enterprise knowledge base integration, document analysis, RAG pipelines, and expert-finding capabilities as standard features. Microsoft Copilot indexes organizational knowledge across the entire Office ecosystem. Anthropic, Google, and OpenAI all offer enterprise-grade API access with SOC 2 compliance, data residency controls, and security infrastructure maintained by teams whose sole job is to prevent exactly the kind of breach Lilli experienced.
In the language of my Brighter Headlights framework, the SaaS vendors’ headlights have now fully illuminated the road Lilli was built on. The core use case of “search our internal knowledge base and synthesize answers with citations” is no longer a proprietary capability; it is a feature. McKinsey built a 150-person platform team to deliver what is rapidly becoming a commodity.
Samsung’s Gauss: Same Lesson, Bigger Company
Samsung provides another data point. In late 2023, the company unveiled Samsung Gauss, its proprietary generative AI model designed to compete with ChatGPT and power its consumer electronics ecosystem. Samsung had banned employees from using ChatGPT earlier that year after engineers leaked proprietary code to the platform; Gauss was supposed to be the secure, in-house alternative.
By 2025, Samsung had quietly pivoted. Rather than continuing to develop Gauss as a standalone competitor, Samsung struck partnerships with Google, OpenAI, and Perplexity to power its Galaxy AI suite. The company that invested in building its own model ended up licensing the very commercial solutions it had tried to replace.
Where the Framework Actually Holds
None of this means companies should avoid AI investment entirely. The point is precision about where you invest.
Bloomberg built a general-purpose financial LLM in the same zone where OpenAI, Google, and Anthropic were spending billions. That is Zone 1 in the Brighter Headlights framework: inside the platform, where your vendors will always beat you because they have more data, deeper access, and bigger budgets. McKinsey built a knowledge-search-and-synthesis platform in Zone 2: a common system pairing where the commercial vendors’ headlights were already pointed and advancing fast. Samsung tried to build a general-purpose model to compete with companies whose entire existence is building general-purpose models.
All three would have been better served directing those resources toward Zone 3: the proprietary data topologies, the unusual system pairings, and the unique business logic that no vendor has the incentive or the context to productize. For McKinsey, that might mean AI tools that encode the interpretive judgment their senior partners apply to client problems; not the document retrieval, but the pattern recognition that makes McKinsey’s advice different from a well-researched Google search. For Bloomberg, it might mean models that reason about the relationships between financial instruments in ways their analysts actually think; not summarizing earnings reports, but identifying the second-order effects that create trading opportunities.
The competitive moat is never in the plumbing. It is in the recipe.
The Security Dimension Nobody Talks About
The Lilli breach adds a dimension that most build-vs-buy frameworks miss entirely. When you build a proprietary AI platform, you are not just building a product; you are assuming responsibility for an attack surface. Every API endpoint, every database connection, every authentication layer becomes your problem to secure, monitor, and maintain.
Commercial LLM providers employ dedicated security teams, run continuous penetration testing, maintain bug bounty programs, and operate under regulatory scrutiny that forces them to treat security as existential. When you build your own platform, you are betting that your security posture is better than theirs. McKinsey, a firm with significant resources and a sophisticated technology team, lost that bet to a vulnerability that has been in the OWASP Top 10 since the list was created.
For mid-market companies with a fraction of McKinsey’s security budget, the calculus is even more stark. Building a proprietary AI platform is not just a question of capability; it is a question of whether you can secure it. And in an era where autonomous AI agents can probe your infrastructure at machine speed for $20, the honest answer for most companies is no.
The Headlights Keep Moving
The lesson from Bloomberg, McKinsey, and Samsung is not that these were careless organizations. They are among the most sophisticated enterprises on the planet. The lesson is that even they could not outrun the commercial AI providers in zones where those providers had every incentive and resource advantage.
The SaaS vendors’ headlights are always advancing, and the road they illuminate grows longer every quarter. Stop building what your software vendors are about to give you for free. Start building where they never will.


