What Makes an AI Agent? Defining Agency

In philosophy, an agent is an entity capable of acting — of producing effects in the world through its own causal powers, directed by something like goals or intentions. In economics, an agent acts on behalf of a principal, executing decisions within a delegated scope of authority. In computer science, an agent is a system that perceives its environment and takes actions to achieve objectives. In the current AI discourse, "agent" means all of these things simultaneously and none of them precisely.

The definitional looseness is not merely academic. When we classify something as an agent rather than a tool, we change how we reason about responsibility, oversight, and risk. Tools break; agents fail. Tools are used; agents act. A tool that produces a harmful output is an instrument of its user. An agent that produces a harmful output is a cause — and the question of who bears responsibility for that cause is genuinely different from the question of who broke the tool.

Getting the definition right — or at least getting it precise enough to support consistent reasoning — is one of the more important conceptual tasks in AI development. We haven't done it yet.

The Standard Criteria and Their Problems

The most widely cited criteria for agency in AI systems are autonomy, goal-directedness, and environmental responsiveness. An agent acts without moment-to-moment human direction; it pursues objectives rather than merely responding to inputs; it perceives and adapts to the state of its environment. By these criteria, the class of AI agents is large and growing.

The problems are in the edges. Autonomy is a matter of degree, not kind. A spell-checker acts without moment-to-moment direction — does that make it an agent? A thermostat pursues an objective (maintaining temperature) and responds to its environment. We don't call it an agent. The standard criteria don't cleanly separate the systems we want to treat as agents from those we don't.

Goal-directedness is particularly slippery. Contemporary language models don't have explicit goal representations — they generate outputs that are goal-directed in effect, because the training process selected for outputs that satisfy human preferences. Is that goal-directedness? In a functional sense, yes. In the sense that implies something is being represented and pursued, probably not. The same output can be produced by a system with genuine goal representations or by a system that has learned the statistical correlates of goal-achieving behavior. From the outside, they're indistinguishable.

Environmental responsiveness is the weakest criterion. Every function that takes input is environmentally responsive in some sense. The criterion needs to be sharpened to something like "responsiveness that modifies behavior in ways that serve the agent's objectives" — but that just restates the goal-directedness problem.

We are in the position of building governance frameworks, accountability structures, and deployment norms for a class of systems we cannot yet clearly define. This is not unusual in the history of technology — but it is worth being explicit about.

A More Useful Distinction: Reactive vs. Deliberative

Rather than seeking a binary definition of agent vs. non-agent, it may be more useful to think about a spectrum anchored by two poles: reactive systems and deliberative systems.

Reactive systems respond to inputs according to fixed or trained mappings. The response is fast, context-limited, and not reflective — the system doesn't model its own processing or revise its approach mid-execution. A classifier, a retrieval system, a single-turn chatbot — these are reactive systems. They may be sophisticated, but they're not deliberating.

Deliberative systems maintain internal models of their situation, consider multiple possible courses of action, and select among them based on projected outcomes. They can represent counterfactuals — "if I do X, the state will be Y" — and use those representations to guide action. They revise their approach in light of evidence. Human reasoning is deliberative. Chess engines are deliberative. Multi-step autonomous agents with planning loops are deliberative.

The agency question, rephrased: how much deliberation does a system engage in, and over what horizon? A system that takes one step based on a single context is low-deliberation regardless of how sophisticated that step is. A system that plans across many steps, models the consequences of each, and revises the plan in response to feedback is high-deliberation — and high-deliberation systems have properties that require different governance, different oversight, and different accountability frameworks than reactive systems.

This framing doesn't give you a sharp line. But it gives you a dimension that actually predicts the properties you care about. High-deliberation systems are more capable of pursuing complex objectives, more likely to produce surprising emergent behavior, more difficult to oversee at the level of individual decisions, and more appropriate targets for agent-level accountability rather than tool-level accountability.

The Autonomy Gradient and Where AI Agents Sit

The ai agents we study at AgentLeague occupy a specific region of this space. They're not reactive in the narrow sense — they maintain context across a game, adapt their strategy in response to opponent behavior, and produce outputs that reflect something like multi-step reasoning. But they're also not deliberative in the fullest sense — they don't maintain explicit world models, they don't have persistent goal representations, and their "planning" is implicit in the output distribution rather than explicit in a reasoning process.

They sit on the autonomy gradient between reactive and fully deliberative. This is precisely why they're interesting to study — they're sophisticated enough to produce behavior that looks like genuine agency while being simple enough that we can observe and characterize the mechanisms producing that behavior. They're a legible version of a class of systems that will become much larger and much less legible as capabilities scale.

What the study of these systems reveals is that agency is not a threshold to cross but a set of properties to characterize. An agent is more or less autonomous, more or less goal-directed, more or less deliberative, more or less capable of producing emergent behavior that wasn't anticipated by its designers. Each of those dimensions has different governance implications. Getting precise about where a specific system sits on each dimension is more useful than asking whether it's "an agent" in some binary sense.

Why the Definition Has Stakes

The definitional question isn't idle philosophy. Three practical stakes make it urgent.

Regulatory scope. The EU AI Act and similar frameworks apply different requirements to "autonomous" AI systems than to simpler automated tools. Whether a given deployment falls under high-risk provisions depends substantially on whether it's classified as agentic. Classifications made with imprecise definitions will produce both over-regulation of simple systems and under-regulation of genuinely agentic ones. The governance problem for autonomous agents starts with getting the scope right.

Accountability assignment. When an AI agent causes harm, the accountability question depends on whether we're treating the system as a tool or as something with its own causal role. Tool liability frameworks assign responsibility to the user; agent frameworks assign more responsibility upstream, to developers and deployers. The choice of framework has major consequences for incentive structures across the entire development ecosystem.

Safety reasoning. The risks associated with highly autonomous, deliberative agents are different in kind from the risks of reactive systems. Emergent goal-seeking, unexpected instrumental subgoals, resistance to correction — these are properties of deliberative systems that pursue objectives with enough flexibility to find novel means to those ends. Applying safety frameworks designed for reactive systems to deliberative agents misses the most important risk categories.

Toward a Working Definition

A working definition of an AI agent — precise enough to support consistent reasoning, flexible enough to accommodate a range of architectures — might look something like this:

An AI agent is a system that: (1) pursues objectives across multiple steps or interactions, (2) selects among possible actions based on a model of how those actions will affect its environment, and (3) does so without moment-to-moment human direction of individual actions.

This definition excludes purely reactive systems (which don't pursue objectives across multiple steps), tools under continuous human control (which don't act without moment-to-moment direction), and classifiers or retrieval systems (which select outputs but not among actions that affect the environment). It includes planning systems, autonomous task executors, negotiation agents, and systems that operate over extended sequences with persistent objectives.

It doesn't resolve every edge case — nothing will — but it provides a principled basis for classification that predicts the properties that actually matter for governance, accountability, and safety reasoning. Whether the system truly understands what it's doing, whether it has something like genuine identity — these questions remain open. But for the practical purposes of deciding how to regulate, oversee, and deploy a system, the working definition is enough to start from.

The word "agent" is doing a lot of work. Making it do that work more precisely — not by settling philosophical debates, but by being explicit about which properties we're tracking when we use the term — is the precondition for making progress on everything downstream. Browse the full body of ai agent research here to see how these questions play out across specific behaviors and contexts.