Research

Environment Maps: Giving AI Agents the Context to Navigate the Real World

Jun 5, 2026·7 min read

This post is based on Environment Maps by Distyl’s Yenchia Feng, Chirag Sharma, and Karime Maamari. The paper was presented at the ICLR 2026 Workshop on World Models.

Tacit Knowledge: What AI Agents Don’t Know

Somewhere inside every enterprise is a platform that exactly seven people know how to use. Why? Because much of the tacit operational knowledge required to navigate these systems is never explicitly written down in the first place. It lives in workflows, transcripts, recovery patterns, and institutional conventions: which dashboard is stale, which button silently fails, which workflows depend on hidden ordering constraints, and which acronyms only make sense to the team that created them.

This gap between how systems are documented and how they’re actually used has quietly undermined every generation of enterprise automation. Scripted workflows and rules-based tooling have tried to work around it, but you can’t automate what you can’t articulate.

Today’s AI agents face the same fundamental barrier. They can see the interface, but they don’t have a persistent model of how the environment actually works. That limitation shows up clearly in complex web environments, where vanilla frontier agents have consistently struggled on goal-directed tasks with no prescribed execution path.

To close this gap, agents need navigational structure, procedural knowledge, and access to the tacit institutional memory that organizations rarely write down.

“You can’t automate what you can’t articulate.”

Environment Maps: A Living Manual for AI Agents

To bridge the gap between how enterprises actually operate and how agents currently reason, we developed Environment Maps: persistent, multi-modal knowledge graphs that serve as navigational aids for AI agents.

In our experiments, agents equipped with Environment Maps in complex web environments achieve nearly double the task success rate of vanilla agents. The difference comes from how agents use knowledge: instead of reconstructing their understanding of the environment on every task, they operate over a structured model that persists and improves across interactions.

An Environment Map can cover a single application or be composed to span an entire enterprise workflow. And because it continuously learns and updates, the systems built on it grow more resilient over time, adapting to interface changes, workflow shifts, and evolving business logic without starting over.

Think of an Environment Map as a “living manual” for real-world interfaces. It captures not just what actions are possible, but the tacit expert knowledge of when and how to use them. In doing so, it serves as the bridge between the frozen logic of an LLM and the dynamic reality of enterprise workflows.

An Environment Map captures:

Contexts: where the agent is
Actions: what the agent can do
Workflows: how tasks get done
Tacit knowledge: what experts know that isn’t written down

Environment Maps separate environment knowledge from decision-making. Today, most agent systems handle context by adding more information to the prompt—but that context disappears after each interaction, and the model has to reconstruct its understanding from scratch every time. Environment Maps take a different approach: the knowledge is structured, not just a stream of tokens, and it persists and accumulates across tasks instead of vanishing after each one. Over time, this changes how the agent behaves: it follows known paths, makes more predictable decisions, and improves with use.

And because the underlying representation is abstract, the same knowledge can be surfaced in whatever form the consumer needs: a file tree for a coding agent, a queryable graph for a retrieval system, a navigable interface for a human operator, and so on. The knowledge is the same across all of them; only the format changes.

Environment Maps in Action

To understand how an Environment Map powers an agent, let’s look at a task on FRED, a public economic database maintained by the Federal Reserve.

Here, the agent’s goal is straightforward: combine two economic data series using a custom formula. But doing so requires a specific sequence of interface actions: navigating to the Data Series page, opening the graph editor, adding each series individually, and then entering the formula. If the agent misses a step or completes them out of order, the task fails.

1. Contexts and Actions (The “Where” and “What”)

First, the Environment Map identifies the Data Series page. It then fuses accessibility trees, screenshots, and API schemas to define the valid action space, telling the agent exactly which buttons (like “Edit Graph”) are clickable in this specific state.

Figure 1: A hierarchical visualization of the FRED Environment Map depicting contexts, actions, and their relationships. The map captures the Data Series page context with actions for editing graphs, searching data, and applying formulas.

2. Workflows and Tacit Knowledge (The “How”)

Knowing what’s clickable isn’t enough. The agent also needs to know which steps to take, in what order, and the right domain knowledge to apply along the way.

The Environment Map stores a procedure captured from an expert demonstration:

The goal: “Combine data series using a custom formula.”
The procedure: A structured JSON sequence linking specific actions to the goal.

JSON

{
  "goal": "Calculate Derived Metric",
  "steps": [
    {
      "number": 1,
      "action": "Click 'Edit Graph'...",
      "tips": ["Look for the 'Edit Line' tab"]
    },...
    {
      "number": 5,
      "action": "Click 'Add' button...",
      "warning": "Critical: If skipped, series won't be available for formula"
    }...
  ]
}

This allows the agent to reason about the sequence before it takes a single action.

The Learning Loop

Crucially, the Environment Map is not static. It evolves with every use: the agent consults the map to plan, and its experience feeds back to improve the map.

Plan: The agent queries the Environment Map to build a path.
Execute: The agent performs the task.
Update: Outcomes (success, failure, latency) feed back into the Environment Map, reinforcing valid procedures, flagging unreliable steps, and surfacing where the environment may have changed.

This feedback matters because real-world workflows are stochastic. Interfaces change, steps that worked yesterday can fail today, and system behavior can vary across sessions. When that happens, the Environment Map updates, becoming more accurate and comprehensive with use.

Each Environment Map is also fully auditable, because it’s a structured and human-readable representation instead of a black box. Domain experts can inspect the Environment Map to verify what the agent knows, editing directly by correcting procedures, updating terminology, and adding knowledge that the agent hasn’t encountered yet. If a workflow changes, a human can update the map without retraining anything.

Figure 2: A continuous loop depicting how agents use the Environment Map to plan actions, execute them in the environment, observe outcomes, and feed those observations back to update the Environment Map.

The Key Result: Structure Beats Raw Data

The FRED example shows what it looks like when an agent has structured environmental knowledge to work with. But does that structure actually matter, or could you get the same results by simply giving the agent more raw data?

We tested this directly on WebArena, a widely used benchmark that evaluates agents across 812 long-horizon tasks in multiple real-world web environments.

Baseline agent with no prior knowledge of the environment: 14.2% success rate (CI [11.9, 16.7])
Agent with access to raw task recordings: 23.3% success rate (CI [20.5, 26.3])
Agent with an Environment Map: 28.2% success rate (CI [25.2, 31.4])

Agents equipped with Environment Maps nearly double baseline performance and outperform agents that have access to the same underlying data in unstructured form.

Agents with raw task recordings already do better than the baseline, which shows that more data does help. But these agents still have to piece together the structure of the environment on the fly. An Environment Map organizes that data into a structured representation, turning experience into something the agent can directly reason over and leading to even larger gains.

Figure 3: (a) Overall task success on WebArena; (b) Success rate by environment; and (c) Performance on tasks with and without matching task recordings.

Looking closer at Figure 3, a few things stand out.

Not all environments benefit equally from structure. The most significant gains take place in complex, UI-heavy systems where the space of possible actions is large and messy. In WebArena environments like GitLab and CMS, where there are many branching paths and hidden dependencies, Environment Maps make a big difference. In simpler environments like Reddit, where interaction patterns are more predictable, just having task recordings already gets you most of the way there. Structure still helps in those environments, but less dramatically.

Another question is whether agents are actually learning the environment or just replaying what they’ve already seen in task recordings. To test this, we looked at tasks both with and without human demonstrations. If Environment Maps were just replaying recorded behavior, they’d only help on tasks with matching demonstrations. But we found that they improve performance even on tasks with no demonstration at all—a strong signal that Environment Maps are helping agents build and navigate a generalized model of the environment.

Taken together, a clear pattern emerges. More data helps, but organizing that data into structure helps more. And the more complex the environment, the bigger the payoff.

Looking Forward: Better Maps, Not Just Better Models

We view Environment Maps as foundational infrastructure for the next generation of agents. As agents become more capable, the challenge shifts from raw task performance to consistent and auditable operation in complex, evolving environments.

This shift changes how we think about building agents: instead of focusing only on better models, we need better representations of the environments those models operate in. Critically, the knowledge and the model should connect modularly. That way, the investment in mapping an environment carries forward, regardless of the model or vendor powering the agent.

That structured, portable foundation opens up several important directions:

Hierarchical composition: A single agent workflow can span multiple applications. The map for each application can compose into a larger map that covers the full process.
Transfer learning: Onboarding a new tool doesn’t start from zero. Patterns from similar environments are reusable and carry over, accelerating time to value.
Human-AI collaboration: Agents and users work from the same representation of how systems operate, making it easier to oversee, correct, and trust agent behavior.
Continuous validation: Teams can run automated tests to continually validate Environment Maps as they evolve. This safeguards against inconsistencies as new data and expert edits flow in.

As agents take on real business workflows, the priority shifts from raw model intelligence to adaptability and reliability. Environment Maps deliver both at scale. They give agents the operational knowledge they’ve been missing: a living, structured map of the world they’re operating in, informed and refined by every outcome.

Environment Maps are the research foundation behind Capture, Distyl's technology for turning tacit expert knowledge into structured process maps and automation at scale. Learn more here.

Environment Maps: Giving AI Agents the Context to Navigate the Real World

Tacit Knowledge: What AI Agents Don’t Know

Environment Maps: A Living Manual for AI Agents

Environment Maps in Action

The Key Result: Structure Beats Raw Data

Looking Forward: Better Maps, Not Just Better Models

Related articles

A Systems View of the Space

Distyl Takes #1 Spot on BIRD Benchmark (Leading Text-to-SQL Benchmark)

Lattice: Building Self-Correcting Guardrails for Conversational Agents