Environments

In designing an agent, the first step is to specify the environment as fully as possible, which includes not only a description of the objects external to the agent, but also the agent's sensors for perceiving the environment and actuators for performing actions against it.

Consider a taxi driver as an agent. The driving environment is made up of the city's roads, traffic lights, police, pedestrians, riders, weather conditions, other vehicles—basically everything that could affect the driver's decisions. The sensors include the taxi's cameras, speedometer, GPS, fuel gauge, etc. and the actuators include the taxi's steering, accelerator, brakes, turn signal, horn, etc.

In the taxi-driving example, the range of environments that might arise is vast. Is the GPS reliable or sometimes does it emit false information? Is there a lot of traffic while the driver is on the road? How familiar is the driver with the traffic laws? But even though the environment may get arbitrarily complex, we can categorize it by different dimensions to make it much easier to select the right agent design. Here are some of the most important dimensions, with informal explanations:

Fully observable vs. Partially observable: If an agent's sensors give it access to all the information about the environment that's relevant to the choice of action, then we say the environment is fully observable. For example, if the taxi driver has been fitted with high-resolution cameras and an ultra-precise GPS system that feed it real-time information about the roads, traffic lights, police locations, pedestrians, other vehicles, weather—basically everything that allows the driver to make the right driving decisions, then the environment is fully observable. In fully observable environments, the agent need not maintain any internal state to keep track of the world. It has an accurate picture of the environment from the all the information collected by its sensors. However, if the agent doesn't have access to some percepts about the environment (e.g. the GPS is broken or the camera can't see around the corner to detect a stopped vehicle), then the environment is partially observable. This is the case in most driving scenarios because we don't have all the information about our surroundings in real-time to make perfect decisions. If the agent has no sensors at all, then the environment is unobservable.

Single-agent vs. Multi-agent: Does the environment contain other agents that influence the agent's actions? If not, then the environment is a single-agent environment. If yes, then the environment is a multi-agent environment. Obviously, the taxi-driving environment is a multi-agent environment with other vehicles, pedestrians, etc. affecting the taxi's behavior. On the other hand, an agent solving a Rubik's cube by itself is clearly in a single-agent environment. It's not always obvious which entities in the agent's environment should also be viewed as agents. Does the agent A, e.g. taxi driver, have to treat an object X , e.g. another vehicle, as an agent, or can it be treated merely as an object? Making this decision boils down to whether B makes decisions about its behavior that depends on agent A's behavior. In taxi-driving for example, avoiding collisions is in the best interest of all vehicles on the road and so it's appropriate to view the other vehicles as agents. These vehicles, along with the taxi, are acting cooperatively to avoid crashing into each other.

Deterministic vs. Nondeterministic: If the next state of the environment is completely determined by the current state and the action executed by the agent(s), then the environment is deterministic; otherwise, it is nondeterministic. Taxi driving is clearly nondeterministic, because the behavior of other traffic can't be predicted exactly; moreover, the taxi may experience an unexpected failure like a tire blowout or engine seizure. In contrast, a Rubik's cube is deterministic because the next state of the cube is completely determined by the current configuration and the move you execute. In principle, an agent doesn't need to worry about uncertainty in a fully observable, deterministic environment, but if the environment is partially observable, then it could appear to be nondeterministic. We say an environment is nondeterministic if the possibilities for the next state are listed, but the likelihood of each possibility is not quantified (e.g. "there's a chance of rain tomorrow"). The environment is stochastic if it explicitly attaches a probability to each possible subsequent state (e.g. "there's a 25% chance of rain tomorrow").

Episodic vs. Sequential: In an episodic environment, the agent's experience is divided into atomic episodes. In each episode the agent makes an observation and then executes a single action. Crucially, the next episode does not depend on the actions taken in previous episodes. For example, an agent that has to assemble items from two parts on an assembly line bases each decision on the current two parts, regardless of previous decisions, and the current decision doesn't affect the assembly of the next parts. In sequential environments, on the other hand, the current decision could affect all future decisions. Taxi driving is sequential where immediate actions can have long-term consequences. So episodic environments are much simpler than sequential environments because the agent does not need to think ahead.

Dynamic vs. Static: If the environment can change while an agent is deliberating which action to take, then the environment is dynamic; otherwise, it is static. Static environments are easier to manage because the agent doesn't need to constantly monitor the world while it's deciding, nor does it need to worry about the passage of time. Taxi driving is clearly dynamic: the other vehicles and the taxi itself keep moving while the driver figures out what to do next. Solving a Rubik's cube is typically static.

Discrete vs. Continuous: The distinction between discrete and continuous can apply to the state of the environment, to the way time is handled, and/or to the observations and actions of the agent. Taxi driving is a continuous-state and continuous-time problem: the speed and location of the taxi, and of the other vehicles, conform to a range of continuous values. Taxi-driving actions are also continuous (steering angles, etc.). Input from digital cameras is discrete, strictly speaking, but is typically treated as representing continuously varying intensities and locations. The Rubik's cube is a discrete-state and discrete-time problem: the state of the cube is a discrete set of possible configurations, and time is handled in discrete steps (one move at a time).

Known vs. Unknown: This distinction doesn't refer to a property of the environment itself but to the agent's knowledge about the "laws" of the environment. In a known environment, the outcomes (or outcome probabilities if the environment is stochastic) for all actions are given. In an unknown environment, the agent needs to learn how it works in order to make good decisions. The difference between known and unknown environments is not the same as the one between fully and partially observable environments. It is quite possible for a known environment to be partially observable: For example, in poker, the rules are known but each player is unable to see the cards of other players, nor the cards that have not yet been turned over. Conversely, an unknown environment can be fully observable—in a new video game, the screen may show the entire game state but the player doesn't know what the buttons do until they try them.

Here are some examples of familiar environments and their properties:

Taxi driving
Observable	Partially
Single?	Multi
Deterministic?	Nondeterministic
Episodic?	Sequential
Dynamic?	Dynamic
Discrete?	Continuous

Rubik's cube
Observable	Fully
Single?	Single
Deterministic?	Deterministic
Episodic?	Sequential
Dynamic?	Static
Discrete?	Discrete

Chess
Observable	Fully
Single?	Multi
Deterministic?	Deterministic
Episodic?	Sequential
Dynamic?	Static
Discrete?	Discrete

Medical diagnosis
Observable	Partially
Single?	Single
Deterministic?	Nondeterministic
Episodic?	Episodic
Dynamic?	Static
Discrete?	Continuous

Note that the assignments of properties are not always clear-cut. For example, the medical diagnosis task is considered episodic if the task is selecting a diagnosis from a list of symptoms. But the problem can be framed as sequential if the decision-making process is refined over time with feedback about correct and incorrect diagnoses.

The hardest case for any agent is when the environment is partially observable, multi-agent, nondeterministic, sequential, dynamic, continuous, and unknown. A good example of this is life, where we are the agent, the environment is the universe, and we have no idea how to make the best decisions for ourselves.