Graph Search Algorithms and Applications

10 min readMar 4, 2021

A closer look at BFS, DFS, Dijkstra’s Algorithm, Best FS, and A* Search

As we go about our daily lives, we often see graph search algorithms at work. How does a robot navigate around a maze? How does Google Maps determine the shortest route to your destination? How do companies like LinkedIn or Facebook compute the degree of connections in social graphs? Without graph search algorithms, all these things we take for granted would not exist.

But what are the driving ideas behind these programs? While often looked upon as complicated, the basic ideas behind graph search algorithms like DFS, BFS, Dijkstra’s algorithm, Best FS, and A* search are actually quite simple.

Before we start, we need to introduce some graph terminology.

A graph is either undirected (edges go both directions) or directed (edges go in the specified direction only). This is useful in maps when we are determining how to get from one destination to another with the cheapest airfare; in this case, we would use a directed graph where the nodes are cities and the edges are the air routes. These edges can be weighted (given a value), which represents the cost of traversing a particular edge; in our example, the weights are the costs of air fares between cities.

DFS vs. BFS

The two most elementary graph search algorithms are depth first search (DFS) and breadth first search (BFS). DFS traverses a graph by going through an entire branch before backtracking to the nearest node whose neighbors have not been fully explored. BFS traverses a graph by fully exploring one level at a time.

Depth First Search

Suppose three people want to explore all the tunnels in a cave. Person 1 only brings pebbles, marking which paths he has been to before. However, he will eventually get surrounded by pebbles and will not be able to trace the way out of the cave (dying of starvation, exhaustion, etc). Person 2 only brings a rope, tying one end to the entrance of the cave and holding the other end while exploring the cave. He will be able to explore some tunnels, but he will not be able to explore everything as he is unable to remember which paths he has already explored. However, he will be able to get back out by following the rope. Person 3 brings both the pebbles and the rope; he succeeds in fully exploring the cave, as he can mark where he has been before, and can also retrace his steps to the last intersection he has not fully explored, eventually getting back to the entrance of the cave using the rope.

The rope functions as the stack data structure, which follows last in, first out (LIFO) order where the most recently added items are retrieved first, as in a stack of plates. This keeps track of which paths have not been fully explored. The pebbles represent the visited list that contains the nodes that have been explored so that the program does not re-explore them.

In the picture below, blue represents nodes that are put into the stack for later exploration and green represents nodes that have been fully explored.

With this analogy, let us now put the idea in more concrete terms. Below is an implementation of DFS in Java. The program assumes that the input is an adjacency list mapping numbered nodes to their neighbors. In this implementation, you will need to set up the stack and the visited list. First, add the start node to the stack and the visited list. While the stack is not empty (meaning the graph has not been fully explored), remove the first node from the stack and put all its neighbors into the stack for later exploration. Explore the node that was just removed and put it in the visited list. Repeat with each node in the stack until it is empty. When it is done, you will get a list of the nodes you explored in topological order.

An application of this is a robot navigating a maze. The robot will go as far as possible until it hits the exit or a dead end. When it hits a dead end, it backtracks to the previous junction to continue its exploration of unvisited paths. Once it hits the exit, the task is done and the program presents a path from the entrance to the exit.

Breadth First Search

On the other hand, BFS uses the data structure queue, which follows first in, first out (FIFO) order. A common analogy is people in line at a cashier: the first person in line gets checked out first by the cashier and exits the store first.

You can think of BFS like contact tracing people. Let us say someone is notified he has a disease and needs to figure out who else he might have spread it to. The first set of people (first level) he might notify are individuals who he has seen in the past week. These people would then notify other individuals that they have seen in that time, and so on. In this scenario, instead of exploring a path as far as possible before backtracking, we explore it level by level.

In the picture below, blue represents nodes that are put into the queue for later exploration and green represents nodes that have been fully explored.

Below is an implementation of BFS in Java. The program assumes that the input is an adjacency list mapping numbered nodes to their neighbors. Like DFS, you also need to keep track of a list of visited nodes to make sure you do not redundantly visit nodes you have already explored. First, add the start node to the queue and the visited list. While the queue is not empty (meaning the graph has not been fully explored), remove the queue’s first node and put its neighbors into the queue for later exploration. Explore the node that was just removed and put it in the visited list. Repeat with each node in the queue until it is empty. When it is done, you will get a list of the nodes in the order they were explored (based on their distance from the start node in the graph).

A notable application of this is used in social media to compute the degree of connections of a member in a social graph. Here, friends of a person are categorized as first degree, friends of friends are categorized as second degree, and so on (see picture below). The computed degree of connections formed by BFS is then used for friend recommendations.

BFS can be modified to find the shortest path from the start node to any other node in the graph. However, its solution might not be the optimized path since it does not take into consideration the cost of each edge. An example of this is air travel where perhaps it is cheaper to take a connecting flight (represented by two edges) rather than a direct flight (represented by one edge). To solve that, we need to represent our graph as a weighted graph. That brings us to the next segment of our discussion.

Dijkstra’s Algorithm

Dijkstra’s algorithm works on a weighted graph (all weights must be positive). Each node keeps track of its shortest distance from the start node, which is initially set to infinity. Nodes are explored in ascending order based on their distances from the start node. For each node, we explore each of its neighbors and determine if the neighbor can improve its shortest path by traveling through the current node; if it can, we replace the value with this shorter path distance.

In the picture below, yellow is the current node being explored. Blue represents nodes that are the current node’s (yellow) neighbors that are being evaluated and green represents nodes that have been fully explored. Red edges extend out of the current node to neighbors that are being evaluated (blue).

Below is an implementation of Dijkstra’s algorithm. The program assumes that the input is an adjacency list mapping numbered nodes to their neighbors with their respective edge costs. In this implementation, everything except the start node is initialized to infinity (in Java, Integer.MAX_VALUE) for its distance; the start node has a distance of 0 (as it has a distance of 0 to itself!). You will need to initialize a priority queue with all the nodes. A priority queue is a queue that organizes its contents in ascending order based on their values. Then, while this priority queue is not empty, you remove the first node, which is the node with the shortest distance from the start that has not been fully explored. For each of its neighbors, determine if the current node’s distance plus its edge length to its neighbor is less than the neighbor’s stored path distance: if it is, update the value with this shorter path distance. When the priority queue is empty, the program returns a list of all the nodes and their distances from the specified start node.

Best First Search

In best first search, the node selected for the next exploration is based on some distance function, called g(n), from the start node. DFS, BFS, and Dijkstra’s algorithm are all special cases of best first search. The next node to explore is chosen with the maximum g(n) for DFS, and minimum g(n) for BFS and Dijkstra’s algorithm.

An application of best first search is web crawling. Web crawling is used to index webpages for search engines. In web crawling, every website is a node and all of the hyperlinks on a website are its neighbors. Upon exploring a webpage, each embedded link is assigned a relevance score (for example, by page rank) and then further webpages are explored in descending order of priority.

Below is an implementation of best first search. The main difference of best first search from BFS and DFS is that it uses a priority queue data structure in place of the stack in DFS and queue in BFS. The priority queue data structure returns the item with the best score among its stored items.

An implementation problem is that the search space is too large to be fully explored due to insufficient memory space. To solve this, we place a maximum limit on the number of nodes stored in the heap, maxHeapSize. Before adding a node to the heap, we need to ensure that the heap size is smaller than maxHeapSize. If the limit is exceeded, we need to remove some low g(n)-scored nodes from the other end of the priority queue to make room. The MinMaxPriorityQueue (from the Google guava library) is used here as we need to remove items from both ends of the priority queue.

A* Search

Best first search only optimizes the cost of traversing from the start node to the current node, i.e. g(n). In addition to g(n), A* also considers the cost from the current node to the destination, h(n), optimizing g(n) + h(n).

The code to calculate the current best path cost for both Dijkstra’s algorithm and A* is shown below.

For Dijkstra’s algorithm, the candidateScore variable only accounts for comparing the current best score to a new score from the path that passes through the current node being explored. It does not care if this new path strays away from the destination. For A*, candidateScore adds in the cost of the Manhattan distance of the current node to the end goal so that candidateScore also accounts for the estimated cost of the distance from the current node to the goal.

Both Dijkstra’s algorithm and A* will eventually find the same best path through the graph. However, because A* adds the heuristic h(n), it is able to find the best path faster.

Dijkstra’s Algorithm + A* Applications

The performance improvement of A* over Dijkstra’s algorithm is best illustrated by the navigational application of finding the faster route in a city like Google Maps. In the diagram below, both Dijkstra’s algorithm and A* correctly return the shortest path. Since Dijkstra’s algorithm does not consider the cost from the current node to the destination, there is a significantly larger number of nodes explored (marked by D), many of which go off course and are not helpful in finding the shortest path. However, A*, with its additional heuristic h(n), is able to explore more promising paths first, thereby resulting in fewer potential nodes explored (marked by *). As a result, A* is much more efficient than Dijkstra’s algorithm.

Summary

There are many more real world applications of these algorithms, and it is blatantly evident that graph search algorithms are everywhere. From navigating mazes, to friend recommendations, and even Google Maps, hopefully you now not only see the importance of these graph search algorithms and their applications, but also realize their potential in our technological future and the possibilities they have to further advance other industries.

If you would like to access any of the code mentioned in the article, check out my Github repository here:

https://github.com/valkwek/Graph-Search-Algorithms