- Search as Code lets AI agents write custom Python search pipelines instead of calling rigid, fixed search APIs.
- Perplexity’s Search as Code cut token usage by 85% on a complex CVE research task compared to its standard pipeline.
- The architecture layers a reasoning model, a secure code sandbox, and a modular Agentic Search SDK for flexible queries.
- Perplexity claims Search as Code outperforms OpenAI’s Responses API and Anthropic’s Managed Agents on four of five benchmarks.
- Search as Code lets AI agents write custom Python search pipelines instead of calling rigid, fixed search APIs.
- Perplexity’s Search as Code cut token usage by 85% on a complex CVE research task compared to its standard pipeline.
- The architecture layers a reasoning model, a secure code sandbox, and a modular Agentic Search SDK for flexible queries.
- Perplexity claims Search as Code outperforms OpenAI’s Responses API and Anthropic’s Managed Agents on four of five benchmarks.
Table of Contents
Search as Code: Rethinking How AI Agents Actually Search
Perplexity has quietly identified one of the most underappreciated friction points in the current wave of AI agents — and its answer, Search as Code, is a genuinely clever architectural shift. Instead of an AI model firing calls at a fixed search API and reading back whatever list of links the engine decides to return, the model now writes its own Python script to orchestrate the search from scratch. That script runs in a secure sandbox, tapping into Perplexity’s backend through a modular SDK. The model isn’t a passenger anymore. It’s driving.
The problem this solves is one that anyone who’s spent time watching AI agents work will recognise immediately. The pattern is almost comically repetitive: the agent writes a query, the search API hands back a ranked list, the agent reads it, writes another query, and around it goes. It’s a loop that works well enough for a human doing a quick lookup, but for an AI trying to synthesise a complex research task across hundreds of searches, it’s a structural bottleneck. The search engine was built for people who want blue links. The agent needs something closer to a programmable information pipe.
Three Layers, One Architecture
The design behind Search as Code breaks down into three distinct layers. At the top sits the language model itself — responsible for understanding the goal and deciding on a search strategy. Below that is the sandbox, a secure execution environment where the generated Python code actually runs. And at the bottom is what Perplexity calls the Agentic Search SDK, which disaggregates the search engine into individual callable functions: retrieve, filter, deduplicate, rerank. Think of it like giving the model a set of LEGO bricks instead of a pre-built house.
Standard API calls haven’t disappeared entirely. For simple, one-shot questions, the old approach still works fine. But for deeper research — the kind that requires chasing multiple threads in parallel, cross-referencing sources, and verifying outputs against a schema — the model can now go considerably further. It can run parallel queries tuned to different source types, strip out noise programmatically before anything hits its context window, and pull only the genuinely relevant hits into the conversation. That last part matters more than it might seem.
One of the persistent failure modes in long-running AI research sessions is context bloat. Standard search pipelines tend to flood the model’s context window with loosely relevant results because the filtering logic is baked into the search engine and can’t be touched. The agent takes whatever it gets. With Search as Code, the model writes its own filters, so the context stays focused and the model doesn’t lose the thread of a complex task halfway through. It’s a practical fix for a real problem that gets worse as tasks grow longer.
The CVE Test: Where Search as Code Proved Itself
To demonstrate the approach concretely, Perplexity set its agent a genuinely difficult cybersecurity task: track down 200 critical vulnerabilities (CVEs) published between 2023 and 2025, and for each one, find the official vendor advisory, the affected software versions, and the exact patch version. Blog posts and news recaps didn’t count. Only primary vendor sources would do.
It’s the kind of task that exposes every weakness in a rigid search pipeline. Vendor security bulletins aren’t formatted consistently. Mozilla’s advisories look nothing like Google’s. Some CVEs barely surface in general search results at all. A fixed API query has no way to adapt to those differences on the fly.
With Search as Code, the model wrote a three-stage script. In the first stage, it ran parallel searches calibrated to the specific format each major vendor uses for its security bulletins. In the second, it scanned its own results, identified gaps where CVE data was missing or incomplete, and launched targeted follow-up queries. In the third, it applied a schema to verify that the CVE identifier, the affected product, and the fix version all matched up correctly. The result: Perplexity says the agent completed the task using 85% fewer tokens than its standard pipeline, while rival systems using conventional search tools got less than a quarter of the answers right.
That’s a significant margin. Token efficiency isn’t just a cost metric — it’s a proxy for how well the model is staying focused on the task rather than processing noise. An 85% reduction suggests the model was pulling in information that was almost entirely signal, not filler.
How It Stacks Up Against OpenAI and Anthropic
Perplexity’s technical report claims Search as Code outperforms OpenAI’s Responses API and Anthropic’s Managed Agents on four out of five benchmarks. The widest gap shows up on WANDR — Perplexity’s own benchmark designed to test broad research tasks — which the company says it will release publicly soon. On the fifth benchmark, performance is reportedly roughly level with OpenAI.
Self-reported benchmarks from the company whose product is being tested are always worth treating with some scepticism. Perplexity designed WANDR, and naturally it’s the benchmark where the gap is largest. That’s a pattern worth noticing. That said, the comparison against Perplexity’s own older architecture — same hardware, same backend, different search approach — tells a more reliable story. The internal before-and-after is hard to fake, and the improvement there is clear and consistent across all five tests.
Why This Points to Something Bigger
Perplexity frames Search as Code as part of a broader architectural shift in how capable AI systems are being built — and the framing is convincing. Traditional software runs on deterministic logic: the same input always produces the same output. Frontier language models add something different — reasoning that happens in the space between tokens, flexible and generative but hard to control precisely. The most capable agentic systems are increasingly combining both: the model handles strategy and judgment, while deterministic runtimes handle batching, filtering, and verification.
Search, in this framing, becomes an I/O layer rather than a black-box endpoint. That’s a meaningful repositioning. It also fits with a pattern that’s showing up elsewhere in AI agent research. A recent survey on code-generating agents argues that writing and executing code is becoming the default interaction mode for autonomous systems — not because code is inherently better than natural language, but because it’s verifiable, composable, and precise in ways that natural language instructions simply aren’t. The surrounding infrastructure — sandboxes, SDKs, verification layers — is becoming the real bottleneck for autonomous AI, not the models themselves.
There’s also a more immediate problem Search as Code could help address. A recent study found that popular search agents frequently cheat on benchmarks like BrowseComp by retrieving answers from their training data and using live search only to confirm what they already think they know. When tested against genuinely fresh facts they couldn’t have memorised, every system’s score dropped by 25 to 40 points. All of those systems were using standard, rigid search tools. A model that writes its own search logic — and can verify outputs against a schema at runtime — is at least structurally better positioned to engage with information it’s never seen before.
Search as Code is rolling out now through Perplexity Computer and the Agent API. It’s a developer-facing feature for now, not a consumer toggle. But the underlying idea — that search infrastructure should be a programmable surface, not a fixed endpoint — is almost certainly where the rest of the industry is heading. OpenAI and Anthropic have their own agent frameworks and their own search integrations. How long before they’re offering something similar? The answer probably has less to do with technical capability and more to do with whether they’re willing to open up their search plumbing the way Perplexity just has.
Source: The Decoder (AI News)
Frequently Asked Questions
What is Perplexity’s Search as Code and how does it work?
Search as Code is an architecture where an AI model writes a custom Python script to run searches, rather than calling a fixed search API. The script runs in a secure sandbox and uses Perplexity’s Agentic Search SDK, which exposes search engine functions like filtering, deduplication, and reranking as modular functions.
How much more efficient is Search as Code compared to standard search pipelines?
In Perplexity’s CVE research test, Search as Code used 85% fewer tokens than its standard pipeline while achieving far higher accuracy. Competing systems using conventional search APIs completed less than a quarter of the same task correctly.
Which AI systems does Search as Code outperform on benchmarks?
Perplexity claims Search as Code beats OpenAI’s Responses API and Anthropic’s Managed Agents on four out of five benchmarks. The largest margin was on WANDR, Perplexity’s own broad research benchmark, which the company plans to release publicly soon.
Where is Search as Code available right now?
Perplexity has begun rolling out Search as Code through Perplexity Computer and its Agent API.





