- LLM date hallucinations silently corrupt booking and scheduling agents by misreading weekdays, ranges, and locale-specific expressions.
- The open-source whenis library fixes LLM date hallucinations by moving all date resolution into a deterministic, testable tool.
- whenis ships with Ukrainian and English locales, a booking plugin, and a four-layer parsing pipeline inspired by Duckling.
- Agents invoke whenis as a tool call instead of guessing — multi-candidate output lets the agent re-rank results using conversation context.
- LLM date hallucinations silently corrupt booking and scheduling agents by misreading weekdays, ranges, and locale-specific expressions.
- The open-source whenis library fixes LLM date hallucinations by moving all date resolution into a deterministic, testable tool.
- whenis ships with Ukrainian and English locales, a booking plugin, and a four-layer parsing pipeline inspired by Duckling.
- Agents invoke whenis as a tool call instead of guessing — multi-candidate output lets the agent re-rank results using conversation context.
The Date Problem Nobody Talks About Enough
LLM date hallucinations are one of those failure modes that don’t show up in benchmarks but absolutely show up in production. You build a scheduling assistant, ship it, and then a user asks it to “book something for next Friday” — and the model confidently resolves that to the wrong date. Not dramatically wrong. Just one week off, or the wrong weekday entirely. Wrong enough to cause real problems, subtle enough that users might not catch it immediately.
The root cause isn’t a mystery. Large language models don’t have a clock. They have no reliable anchor to the present moment, and their training data contains so many inconsistent references to relative time that their internal sense of “now” is genuinely unreliable. Ask an LLM to calculate which calendar date falls on “next Thursday” given a specific reference date, and you’ll get the right answer often enough to feel confident — and the wrong answer often enough to cause incidents. LLM date hallucinations of this kind are structural, not incidental. Tell the model to “be careful with dates” in the system prompt, and you’re essentially asking it to be careful with something it’s structurally incapable of being careful about.
This is especially brutal for applications involving bookings, reminders, travel planning, or any kind of calendar interaction. And it gets worse when you factor in multilingual or multi-locale environments, where expressions like “next week” carry different cultural assumptions depending on whether your user is in Kyiv or Kansas City.
What whenis Actually Does
A developer going by Nazar F on Dev.to has published an open-source library called whenis that takes a direct swing at this problem. The core idea is clean: stop asking the LLM to interpret dates at all. Instead, expose a resolveDate(expression, reference) tool that your agent can call, pass the user’s raw date expression into it, and get back a structured, deterministic result. The model’s job becomes recognising that a date expression exists and invoking the tool — not doing the math itself. LLM date hallucinations are avoided entirely because the model never attempts the calculation.
That’s the right architectural instinct. Tool use in LLM agents exists precisely to offload tasks that require precision, consistency, and determinism — things models aren’t great at. Date arithmetic fits that description perfectly. What’s interesting about whenis is how much thought has gone into making the tool actually production-ready rather than just a proof of concept.
The library is built around a four-layer pipeline clearly inspired by Facebook’s Duckling parser, the Haskell-based NLP library Meta used internally for structured entity extraction. The pipeline runs input through preprocessing, then tokenisation and tagging, then a rule engine, and finally a resolver that produces a ParseResult. Each layer is independently testable, which matters a lot when you’re debugging why a specific phrase in a specific locale is resolving incorrectly.
LLM Date Hallucinations Across Languages and Locales
One of the more impressive aspects of whenis is how seriously it takes locale support. LLM date hallucinations are bad enough in English, but they get significantly worse in other languages, where grammatical case systems mean that the word for “Friday” can appear in seven different forms depending on sentence structure. Ukrainian, for instance, has exactly this kind of complexity — months inflect across seven grammatical cases, weekdays across four.
whenis ships a Ukrainian locale that handles this in full. The locale is implemented as a data file rather than engine logic, which means adding support for Russian, Polish, or Czech is a matter of writing a new locale source file — no changes to the core parser needed. That’s a sensible design choice that should make community contributions significantly easier.
In practice, you’d initialise a parser with your chosen locale and any plugins, then call parse() with the user’s expression and a reference date. Pass in the Ukrainian phrase for “next Friday” with a reference date of May 28, 2026, and you get back a clean JSON result: type ‘date’, ISO date ‘2026-06-05’, confidence 1. Pass in a range expression like “from the 5th to the 10th of June” and you get a range object with start, end, and a nights count. The output is typed, structured, and predictable — exactly what you want when you’re wiring this into an agent’s tool-calling layer. Without a library like this, LLM date hallucinations in multilingual deployments are close to unavoidable.
Handling Ambiguity Without Hiding It
Here’s where whenis makes a particularly smart call. When a date expression is genuinely ambiguous — say, a bare “Friday” mentioned mid-conversation with no other context — the library doesn’t silently pick one interpretation. Instead, it returns multiple candidates, each with a confidence score. Your agent can then use the surrounding conversation context to re-rank those candidates and pick the most appropriate one.
This is the right approach, and it’s worth contrasting with what most LLM agents do today: the model guesses silently, picks one interpretation, and acts on it. The user never knows a choice was made. LLM date hallucinations thrive in exactly this silent-guess pattern. With whenis surfacing candidates explicitly, that decision point moves to where it belongs — in the agent’s logic layer, where it can be reasoned about, logged, and if necessary, clarified with the user.
The library also handles the case where an expression is fuzzy in a way that can’t be resolved deterministically — for instance, a reference to a holiday that falls on different dates depending on year or region. In that case, the ParseResult comes back with type ‘fuzzy’, a reason code, and metadata that plugins can extend. Downstream, your agent can decide whether to ask for clarification or apply a fallback rule. Again, the failure mode is explicit rather than silent.
The Plugin Architecture and What’s Coming
whenis separates domain-specific parsing logic into plugins rather than baking it into the core. The @whenis/booking plugin handles patterns specific to accommodation and travel bookings — check-in windows, durations, weekend references, holiday-adjacent date ranges. If your domain has its own vocabulary for time expressions, you can build your own plugin against the same API.
The technical packaging is sensible for 2025: dual ESM and CommonJS builds, strict TypeScript, Node 18 minimum, zero DOM dependencies. Installing the core alongside a locale looks like this: npm i @whenis/core @whenis/locale-en. The core is a peer dependency of locales and plugins, which means you control the version explicitly rather than having it silently resolved by a sub-dependency — a small but meaningful decision for anyone maintaining a production codebase.
v0.1 ships with Ukrainian and English locales plus the booking plugin. The v0.2 backlog includes ISO passthrough rules, DD.MM numeric date forms, Ukrainian word-numerals, and broader English coverage. The project is MIT licensed and lives at github.com/norens/whenis, with issues and pull requests open.
Why This Matters Beyond One Library
The specific library here is early-stage — v0.1, one developer, a defined but limited feature set. What’s more significant is the pattern it represents. As AI agents move from demos into production workflows, the assumption that LLMs can handle structured, precise reasoning internally is increasingly being tested and found wanting. LLM date hallucinations are one of the clearest examples of where that assumption breaks down. Calendar logic is precise, locale-dependent, and full of edge cases that expose the probabilistic nature of language model outputs.
The industry response, slowly but clearly, is the same one that whenis embodies: pull deterministic tasks out of the model entirely, wrap them in well-defined tools, and let the model do what it’s actually good at — understanding intent and orchestrating actions. Projects like Apple’s recently announced Foundation framework integrations for on-device AI, and the broader push from OpenAI, Anthropic, and Google toward structured tool use in their agent frameworks, all point in the same direction.
LLM date hallucinations aren’t going to be solved by better training data or larger context windows. They’re going to be solved by developers who stop treating the model as a calculator and start treating it as a coordinator. whenis is a small, focused example of what that looks like in practice — and the pattern it demonstrates has implications well beyond date parsing.
Source: https://dev.to/nazarf/stop-letting-llms-hallucinate-dates-a-tool-for-ai-agents-1jjj



