PostHog is about to do something that most analytics companies quietly avoid talking about: train their own AI models on your product data. The company announced this week that it plans to use data from its platform to build and refine models in-house — and unlike most players in this space, it’s being unusually upfront about the whole thing. PostHog AI models are the centerpiece of what founder James Hawkins is calling the company’s next chapter.
- PostHog AI models will be trained on anonymized customer data starting June 29, with opt-out available in org settings.
- PostHog AI models aim to power features like automated session replay analysis and synthetic user testing at scale.
- EU cloud users and those with existing legal agreements like BAAs are opted out of training by default.
- PostHog is taking a rare transparency-first approach, openly publishing its data training plans rather than hiding them in terms updates.
Why PostHog Is Building PostHog AI Models From Scratch
The short version: the off-the-shelf approach isn’t working well enough. PostHog already uses AI across several features — an AI installation wizard, a general-purpose PostHog AI assistant, and a Model Context Protocol (MCP) integration. By the company’s own account, these features are popular. But they’re also expensive to run and don’t scale cleanly.
Session replay analysis is the clearest example of the problem. PostHog AI can already watch a session replay and flag issues, but doing that at any real volume — across thousands of sessions for a large product team — gets costly fast. The unit economics just don’t work when you’re routing everything through a third-party model provider like OpenAI or Anthropic every single time.
Training proprietary PostHog AI models on the underlying data that drives those replays changes the calculus entirely. A fine-tuned model that already understands the structure of PostHog’s event data, funnel shapes, and common user confusion patterns could do this work far more efficiently — and potentially far more accurately — than a general-purpose LLM that has to relearn context on every query.
The Features This Unlocks
Hawkins is especially excited about two near-term applications beyond smarter replays.
The first is synthetic user testing. The idea is to use accumulated behavioral data to predict where users might get confused or where flows might break — before you ship anything to production. As AI coding assistants like GitHub Copilot and Cursor drive faster shipping cycles, the volume of code reviews and test cases is growing faster than most teams can handle manually. PostHog AI models are designed to automate a meaningful chunk of that burden.
The second is conversion improvement suggestions for features that are already live. If PostHog AI models get good at predicting user behavior, they should theoretically be able to flag friction points and recommend changes that would improve conversion rates — proactively, without a human analyst having to go looking.
This is the vision behind PostHog Code, a new product currently in beta that the company describes not as a coding assistant but as a “product editor.” The framing is deliberate. While tools like Cursor and Copilot optimize for producing good code, PostHog Code is designed to optimize for producing good products — a subtle but meaningful distinction that reflects where the company sees its competitive angle.
What Happens to Your Data
Here’s where it gets important. Training PostHog AI models requires data, and that data comes from what’s already sitting inside customer PostHog instances. The company has laid out exactly how this works:
- All data will be anonymized before it’s used for training
- Only data that already exists in your PostHog instance will be used — nothing is pulled from external sources
- PostHog will do all model training in-house, meaning your data won’t be sent to OpenAI, Anthropic, or any other third-party model provider
- Users on the EU cloud instance are opted out by default
- Users with agreements like BAAs or MSAs that restrict data use are also opted out by default
- All other US cloud users are opted in by default, with training scheduled to begin June 29
- Any organization admin can opt out at any time through PostHog’s org settings
The opt-in default for US customers will raise eyebrows, and rightfully so. Defaulting users into data training programs is a well-worn industry tactic — it maximizes the training pool while relying on user inertia. To PostHog’s credit, the company is at least being honest about it rather than burying it in a terms-of-service amendment that nobody reads. Hawkins explicitly called this out: “Most companies would bury this change in a deceptively boring T&Cs update.”
There’s also a practical incentive structure at play. If you opt out, the new AI-powered features won’t be available to you — they’ll depend on the training data to function. That’s not a threat so much as a structural reality, but it does mean opting out has a real cost. PostHog is essentially saying: you can have your data privacy, or you can have the new features, but not both.
PostHog AI Models and the Broader Industry Picture
What PostHog AI models represent isn’t entirely novel, but the approach is still relatively rare at this scale among developer tooling companies. The more common approach — used by Mixpanel, Amplitude, and others — is to bolt on general-purpose AI via API calls to foundation model providers. That’s faster to ship and requires no training infrastructure, but it means those companies are permanently dependent on third-party model costs and capabilities.
A handful of larger players have gone further. Salesforce has invested heavily in domain-specific AI through its Einstein platform, trained on CRM data. Datadog has been building observability-specific models. The pattern is emerging across enterprise software: companies with rich, proprietary behavioral datasets are starting to realize those datasets are actually their most defensible AI asset.
PostHog’s position is interesting because its data is unusually dense and behavioral — event streams, session replays, funnel data, feature flag exposures, experiment results. That’s the kind of structured, product-interaction data that could genuinely produce useful PostHog AI models, not just a marketing story about AI.
Transparency as Strategy
It’s hard to miss the PR dimension here. PostHog publishing a detailed blog post explaining exactly what it’s doing with customer data — including the opt-out mechanism and the June 29 start date — is a calculated move. In an environment where AI data practices are under increasing regulatory scrutiny, especially in Europe, being the company that proactively disclosed everything is a meaningful differentiator.
The company says it’s also emailing all customers, pushing in-app notifications, and hiring AI researchers to actually build this out. That last point matters: this isn’t vaporware. PostHog is staffing up specifically for model training work, which signals this is a serious infrastructure investment, not a product marketing exercise.
Whether PostHog AI models actually deliver on the promise is another question entirely. Training useful fine-tuned models is genuinely hard, and the company acknowledges as much — describing its plans as “experimental” and noting it will take iteration to figure out what data is actually useful for training. Hawkins was candid: “Every time we’ve added AI in a way that makes the product simpler or more powerful, it’s worked well, so we think it’s worth trying.”
That’s a reasonable bet to make. But as the AI tooling market gets more crowded and customer tolerance for data-use ambiguity continues to shrink, the real test for PostHog won’t just be whether its models are accurate — it’ll be whether users decide the tradeoff is worth it. Right now, the company is doing everything it can to make sure that’s an informed decision.




