A* search algorithm

﻿
A* search algorithm

In computer science, A* (pronounced "A star") is a best-first, graph search algorithm that finds the least-cost path from a given initial node to one goal node (out of one or more possible goals).

It uses a distance-plus-cost heuristic function (usually denoted $f\left(x\right)$) to determine the order in which the search visits nodes in the tree. The distance-plus-cost heuristic is a sum of two functions: the path-cost function (usually denoted $g\left(x\right)$, which may or may not be a heuristic) and an admissible "heuristic estimate" of the distance to the goal (usually denoted $h\left(x\right)$). The path-cost function $g\left(x\right)$ is the cost from the starting node to the current node.

Since the $h\left(x\right)$ part of the $f\left(x\right)$ function must be an admissible heuristic, it must not overestimate the distance to the goal. Thus for an application like routing, $h\left(x\right)$ might represent the straight-line distance to the goal, since that is physically the smallest possible distance between any two points (or nodes for that matter).

The algorithm was first described in 1968 by Peter Hart, Nils Nilsson, and Bertram Raphael. In their paper, it was called algorithm A. Since using this algorithm yields optimal behavior for a given heuristic, it has been called A*.

This algorithm has been generalized into a bidirectional heuristic search algorithm; see bidirectional search.

Algorithm description

A* incrementally searches all routes leading from the starting point until it finds the shortest path to a goal. Like all informed search algorithms, it searches first the routes that "appear" to be most likely to lead towards the goal. What sets A* apart from a greedy best-first search is that it also takes the distance already traveled into account (the $g\left(x\right)$ part of the heuristic is the cost from the start, and not simply the local cost from the previously expanded node).

The algorithm traverses various paths from start to goal. For each node $x$ traversed, it maintains 3 values:
* g(x): the actual shortest distance traveled from initial node to current node
* h(x): the estimated (or "heuristic") distance from current node to goal
* f(x): the sum of g(x) and h(x)

Starting with the initial node, it maintains a priority queue of nodes to be traversed, known as the "open set" (not to be confused with open sets in topology). The lower $f\left(x\right)$ for a given node $x$, the higher its priority. At each step of the algorithm, the node with the lowest $f\left(x\right)$ value is removed from the queue, the $f$ and $h$ values of its neighbors are updated accordingly, and these neighbors are added to the queue. The algorithm continues until a goal node has a lower $f$ value than any node in the queue (or until the queue is empty). (Goal nodes may be passed over multiple times if there remain other nodes with lower $f$ values, as they may lead to a shorter path to a goal.) The $f$ value of the goal is then the length of the shortest path, since $h$ at the goal is zero in an admissible heuristic. If the actual shortest path is desired, the algorithm may also update each neighbor with its immediate predecessor in the best path found so far; this information can then be used to reconstruct the path by working backwards from the goal node. Additionally, if the heuristic is "monotonic" (see below), a "closed set" of nodes already traversed may be used to make the search more efficient.

function A*(start,goal) closedset := the empty set % The set of nodes already evaluated. openset := set containing the initial node % The set of tentative nodes to be evaluated. g_score [start] := 0 % Distance from start along optimal path. while openset is not empty x := the node in openset having the lowest f_score [] value if x = goal return reconstruct_path(came_from,goal) remove x from openset add x to closedset foreach y in neighbor_nodes(x) if y in closedset continue tentative_g_score := g_score [x] + dist_between(x,y) tentative_is_better := false if y not in openset add y to openset h_score [y] := heuristic_estimate_of_distance_to_goal_from(y) tentative_is_better := true elseif tentative_g_score < g_score [y] tentative_is_better := true if tentative_is_better = true came_from [y] := x g_score [y] := tentative_g_score f_score [y] := g_score [y] + h_score [y] % Estimated total distance from start to goal through y. return failure function reconstruct_path(came_from,current_node) if came_from [current_node] is set p = reconstruct_path(came_from,came_from [current_node] ) return (p + current_node) else return the empty path

The closed set can be omitted (yielding a tree search algorithm) if a solution is guaranteed to exist, or if the algorithm is adapted so that new nodes are added to the open set only if they have a lower $f$ value than at any previous iteration.

Properties

Like breadth-first search, A* is "complete" in the sense that it will always find a solution if there is one.

If the heuristic function $h$ is "admissible", meaning that it never overestimates the actual minimal cost of reaching the goal, then A* is itself admissible (or "optimal") if we do not use a closed set. If a closed set is used, then $h$ must also be "monotonic" (or "consistent") for A* to be optimal. This means that for any pair of adjacent nodes $x$ and $y$, where $d\left(x,y\right)$ denotes the length of the edge between them, we must have:

:$h\left(x\right) le d\left(x,y\right) + h\left(y\right)$

This ensures that for any path $X$ from the initial node to $x$:

:$L\left(X\right) + h\left(x\right) le L\left(X\right) + d\left(x,y\right) + h\left(y\right) = L\left(Y\right) + h\left(y\right)$

where $L\left(cdot\right)$ denotes the length of a path, and $Y$ is the path $X$ extended to include $y$. In other words, it is impossible to decrease (total distance so far + estimated remaining distance) by extending a path to include a neighboring node. (This is analogous to the restriction to nonnegative edge weights in Dijkstra's algorithm.) Monotonicity implies admissibility when the heuristic estimate at any goal node itself is zero, since (letting $P = \left(f,v_1,v_2,ldots,v_n,g\right)$ be a shortest path from any node $f$ to the nearest goal $g$):

:$h\left(f\right) le d\left(f,v_1\right) + h\left(v_1\right) le d\left(f,v_1\right) + d\left(v_1,v_2\right) + h\left(v_2\right) le ldots le L\left(P\right) + h\left(g\right) = L\left(P\right)$

A* is also optimally efficient for any heuristic $h$, meaning that no algorithm employing the same heuristic will expand fewer nodes than A*, except when there are multiple partial solutions where $h$ exactly predicts the cost of the optimal path. Even in this case, for each graph there exists some order of breaking ties in the priority queue such that A* examines the fewest possible nodes.

pecial cases

Generally speaking, depth-first search and breadth-first search are two special cases of A* algorithm. Dijkstra's algorithm, as another example of a best-first search algorithm, is the special case of A* where $h\left(x\right) = 0$ for all $x$. For depth-first search, we may consider that there is a global counter "C" initialized with a very big value. Every time we process a node we assign "C" to all of its newly discovered neighbors. After each single assignment, we decrease the counter "C" by one. Thus the earlier a node is discovered, the higher its $h\left(x\right)$ value.

Implementation Details

There are a number of simple optimizations or implementation details that can significantly affect the performance of an A* implementation. The first detail to note is that the way the priority queue handles ties can have a significant effect on performance in some situations. If ties are broken so the queue behaves in a LIFO manner, A* will behave like Depth-first search among equal cost paths. If ties are broken so the queue behaves in a FIFO manner, A* will behave like Breadth-first search among equal cost paths.

When a path is required at the end of the search, it is common to keep with each node a reference to that node's parent. At the end of the search these references can be used to recover the optimal path. If these references are being kept then it can be important that the same node doesn't appear in the priority queue more than once (each entry corresponding to a different path to the node, and each with a different cost). A standard approach here is to check if a node about to be added already appears in the priority queue. If it does, then the priority and parent pointers are changed to correspond to the lower cost path. When finding a node in a queue to perform this check, many standard implementations of a min-heap require $O\left(n\right)$ time. Augmenting the heap with a Hash table can reduce this to constant time.

Why A* is admissible and computationally optimal

A* is both admissible and considers fewer nodes than any other admissible search algorithm with the same heuristic, because A* works from an “optimistic” estimate of the cost of a path through every node that it considers — optimistic in that the true cost of a path through that node to the goal will be at least as great as the estimate. But, critically, as far as A* “knows”, that optimistic estimate might be achievable.

When A* terminates its search, it has, by definition, found a path whose actual cost is lower than the estimated cost of any path through any open node. But since those estimates are optimistic, A* can safely ignore those nodes. In other words, A* will never overlook the possibility of a lower-cost path and so is admissible.

Suppose now that some other search algorithm B terminates its search with a path whose actual cost is not less than the estimated cost of a path through some open node. Algorithm B cannot rule out the possibility, based on the heuristic information it has, that a path through that node might have a lower cost. So while B might consider fewer nodes than A*, it cannot be admissible. Accordingly, A* considers the fewest nodes of any admissible search algorithm that uses a no more accurate heuristic estimate.

Complexity

The time complexity of A* depends on the heuristic. In the worst case, the number of nodes expanded is exponential in the length of the solution (the shortest path), but it is polynomial when the heuristic function "h" meets the following condition:

:$|h\left(x\right) - h^*\left(x\right)| leq O\left(log h^*\left(x\right)\right)$

where $h^*$ is the optimal heuristic, i.e. the exact cost to get from $x$ to the goal. In other words, the error of "h" should not grow faster than the logarithm of the “perfect heuristic” $h^*$ that returns the true distance from "x" to the goal (Russell and Norvig 2003, p. 101).

More problematic than its time complexity is A*’s memory usage. In the worst case, it must also remember an exponential number of nodes. Several variants of A* have been developed to cope with this, including iterative deepening A* (IDA*), memory-bounded A* (MA*) and simplified memory bounded A* (SMA*) and recursive best-first search (RBFS).

References

* cite journal
first = Rina
last = Dechter
coauthors = Judea Pearl
title = [http://portal.acm.org/citation.cfm?id=3830&coll=portal&dl=ACM Generalized best-first search strategies and the optimality of A*]
journal = Journal of the ACM
volume = 32
issue = 3
pages = pp. 505–536
year = 1985
doi = 10.1145/3828.3830

* cite journal
first = P. E.
last = Hart
coauthors = Nilsson, N. J.; Raphael, B.
title = A Formal Basis for the Heuristic Determination of Minimum Cost Paths
journal = IEEE Transactions on Systems Science and Cybernetics SSC4
issue = 2
pages = pp. 100–107
year = 1968

* cite journal
first = P. E.
last = Hart
coauthors = Nilsson, N. J.; Raphael, B.
title = Correction to "A Formal Basis for the Heuristic Determination of Minimum Cost Paths"
journal = SIGART Newsletter
volume = 37
pages = pp. 28–29
year = 1972

* cite book
first = N. J.
last = Nilsson
title = Principles of Artificial Intelligence
publisher = Tioga Publishing Company
location = Palo Alto, California
year = 1980
id = ISBN 0935382011

* cite book
first = Judea
last = Pearl
title = Heuristics: Intelligent Search Strategies for Computer Problem Solving
year = 1984
id = ISBN 0-201-05594-5

* cite book
first = S. J.
last = Russell
coauthors = Norvig, P.
title =
year = 2003
pages = pp. 97-104
id = ISBN 0-13-790395-2

* Tarik Attar's [http://www.tarikattar.com/napier/osmastermap/ Implementation and visualisation of the A* algorithm] - Implementation in PHP and visualisation using Google Map API
* Justin Heyes-Jones' [http://www.geocities.com/jheyesjones/astar.html A* algorithm tutorial]
* Herbert Glarner's [http://herbert.gandraxa.com/herbert/pfa.asp Interactive Single Step Simulation in VB 6.0] , implemented as a DLL, including a GUI allowing simulation in user-defined grids.
* [http://www.policyalmanac.org/games/aStarTutorial.htm Another A* Pathfinding for Beginners] (note: incorrectly states that A* always needs a "closed set")
* Amit's [http://theory.stanford.edu/~amitp/GameProgramming/ Thoughts on Path-Finding and A*]
* Sven Koenig's [http://idm-lab.org/applet.html Demonstration of Lifelong Planning A* and A*]
* Cuneyt Mertayak's [http://www.ceng.metu.edu.tr/~cuneyt/astar.tar.gz A Generic C++ A* Library]
* Tony Stentz's [http://www.frc.ri.cmu.edu/~axs/ Papers on D* (Dynamic A*) Path-Finding]
* Remko Tronçon and Joost Vennekens's [http://el-tramo.be/software/jsearchdemo/ JSearch demo] : demonstrates various search algorithms, including A*.
* [http://lostsouls.org/grimoire_astar A* search algorithm module] in LPC
* [http://www.jamespoag.com/AStarPathfinder.html A* search algorithm module] in object-oriented C++ with Demo Written for the PopCap Games Framework
* [http://sourceforge.net/projects/argorha/ Open Source A*] in polygon soup ( 3D world ).
* Variation on A* called [http://www.cs.ualberta.ca/~mmueller/ps/hpastar.pdf Near Optimal Hierarchical Path-Finding] and associated [http://www.cs.ualberta.ca/~bulitko/F06/presentations/2006-09-29-ML.pdf presentation] .

Wikimedia Foundation. 2010.

Look at other dictionaries:

• Search algorithm — In computer science, a search algorithm, broadly speaking, is an algorithm that takes a problem as input and returns a solution to the problem, usually after evaluating a number of possible solutions. Most of the algorithms studied by computer… …   Wikipedia

• search algorithm — paieškos algoritmas statusas T sritis automatika atitikmenys: angl. search algorithm vok. Suchalgorithmus, m rus. алгоритм поиска, m pranc. algorithme de recherche, m …   Automatikos terminų žodynas

• Binary search algorithm — for performing binary searches on Java arrays and Lists, respectively. They must be arrays of primitives, or the arrays or Lists must be of a type that implements the Comparable interface, or you must specify a custom Comparator object. Microsoft …   Wikipedia

• B* search algorithm — In computer science, B* (pronounced B star ) is a best first, graph search algorithm that finds the least cost path from a given initial node to one goal node (out of one or more possible goals). First published by Hans Berliner in 1979, it is… …   Wikipedia

• D* search algorithm — In computer science and robotics, the D* algorithm (pronounced D star ) is a dynamic version of the backward variant of Dijkstra s algorithm and A* search algorithm. Also known as Stentz s Algorithm, it was first developed by Anthony Stentz in… …   Wikipedia

• Deducting Search Algorithm — The Deducting Search Algorithm (DSA), also known as the Holmes Engine after the author Sir Arthur Conan Doyle’s famous sleuth, is a mathematical formula that interprets the choices made by the seeker whilst looking for a particular piece of… …   Wikipedia

• Rabin-Karp string search algorithm — The Rabin Karp algorithm is a string searching algorithm created by Michael O. Rabin and Richard M. Karp in 1987 that uses hashing to find a substring in a text. It is used for multiple pattern matching rather than single pattern matching. For… …   Wikipedia

• Boyer–Moore string search algorithm — The Boyer–Moore string search algorithm is a particularly efficient string searching algorithm, and it has been the standard benchmark for the practical string search literature. [Hume and Sunday (1991) [Fast String Searching] SOFTWARE PRACTICE… …   Wikipedia

• algorithm — UK US /ˈælgərɪðəm/ noun [C] IT ► a set of mathematical instructions that must be followed in a fixed order, and that, especially if given to a computer, will help to calculate an answer to a mathematical problem: a computer/mathematical/search… …   Financial and business terms

• Search optimization — may refer to:*Search algorithm *Search engine *Search engine optimization …   Wikipedia