A graph is yet another data structure that you can use to store information. Unlike trees, which have a strict hierarchical structure, graphs are more flexible.
Consequences of graphs being more flexible are that (unlike trees) they can can have loops and parts may be disconnected.
Here is a simple graph that stores letters:
This example graph is a directed graph. This just means that each edge in the graph is unidirectional, i.e., it goes from one vertex to another. For example, there is an edge from D to B, but there is in no edge representing the reverse relationship (from B to D).
Also, all the vertices aren't connected in this example graph. I.e., there are connections between A, B, D and E, but there is no way to get to vertex C from any of those vertices. Thus, A, B, D and E form their own component. A second component is made up of C and F.
For example, the above graph could be viewed as a map of what cities are connected by train routes. Viewing it this way, each vertex represents a particular city and each edge represents whether there is a train route from one city to another. We can imagine that the edges are unidirectional since trains are only allowed to go in one direction on the tracks (a new government rule so trains don't crash into each other any more).
Since the graph has different components, there are some cities that are not connected by train routes. For example, there is no way to get from city F to city D. However, just because a city is in a certain component, doesn't mean we can get to it from another city in the component. For example, we cannot get to city D starting from city B.
Finally, the edges in our graph represent a very simple relationship, i.e., one city is connected to another. However, just as we can store information at vertices (e.g., the city name), we can also store information at each edge. For example, we might want to store the distance between cities at edges OR the time the trip takes OR the cost of the ticket OR all of those pieces of information.
To keep our example simple, however, we'll just store information at the vertices, and not at edges.
As mentioned, graphs are pretty generic data structures in that they can be used to represent lots of things. Thus, exactly what operations we'll want for a graph will depend on what we want to do with it.
Suppose we want the following operations:
AddVertex:
Adds a new vertex to the graph.
For example, suppose there is a new city we want to add to our map of
train routes. AddVertex(graph, G)
AddEdge:
Adds a new directed edge to the graph.
For example, adding the city was not enough, we also need to say
how the rail lines connect it to other cities. Thus, we might do
an AddEdge(graph, C, G)
(an edge from C to G).
IsReachable:
Reports whether we can get there from here.
For example, we might want to know whether we can get to city E from city A:
would report a true value. IsReachable(graph, E, A)
Again, we might want to know whether we can get to city D from city E:
would report a false value. IsReachable(graph, D, E)
There are many other operations we might want, but these will suffice for our example.
Since we are providing a new data structure with several operations, we'll want to organize it into a module.
As usual, we'll use ADTs/CDTs to hide the implementation details of our
data structure. Here is how the 2 files that make up the module, i.e.,
the interface (graph.h
) and the implementation file
(graph.c
) will look:
graph.h graph.c ------- ------- #include "graph.h" type-of-element other-types-needed abstract-type-of-graph concrete-type-of-graph function function prototypes definitions
graph.h
in
graph.c
since we always include the header for a module
in the implementation part of the module.
Again, the interface for the graph will need to have a abstract type for the graph (for people to define graph variables) and the type of an element (for functional prototypes)...
The implementation is hidden from the user and will hold the types we need to implement the internals of the graph. What other types are needed will depend on our particular implementation choices.
Now, filling in the types in the interface is easy.
We need the type of an element. To keep it simple, we can just store a letter for each city. (You can imagine storing full city names and lots of other things about each city.)
So, the type-of-an-element is just:
typedef char graphElementT;
The abstract-type-of-graph is always the same, a pointer to the CDT:
The other types we need, those for the internal representation of the graph, will depend on what we actually want to do with the graph...typedef struct graphCDT *graphADT;
Instead of making a haphazard decision about what implementation to use for the graph, we should give it some thought.
The types of graphs we need (directed, with possibly more than
one disconnected component) and what we want to do with them
(the operations AddVertex
, AddEdge
and
IsReachable
) will greatly affect how we should implement
the graph. In other words, if we make poor choices, it may be
difficult or impossible to represent some graphs and it may be
difficult or impossible to implement some operations.
The 2 things that our graph implementation will have to represent are vertices and edges.
------------------------------- | A | B | C | D | E | F | ... | ------------------------------- 0 1 2 3 4 5 ...
Likewise, the edges can be stored using something called an adjacency matrix. Essentially, it represents which vertices are adjacent, or rather, which pairs of vertices have an edge. For example, the adjacency matrix for our original graph would be:
Adjacency matrix: A B C D E F +----------------- A| 0 1 0 1 0 0 B| 0 0 0 0 0 0 C| 0 0 0 0 0 1 D| 0 1 0 0 1 0 E| 0 1 0 0 0 0 F| 0 0 1 0 0 0
Each position represents whether one vertex is connected to another
(value 1
= true) or not (value 0
=
false). Note that it encodes the direction of the edges. For
example, since there is a 1
in row A, column
B, there is an edge from A to B. Since there is
no edge in the reverse direction (from B to A), row
B, column A has a 0
.
Of course, this adjacency matrix could be represented by a 2-dimensional array.
The drawback to this approach lies in that we want to add vertices. Adding vertices would require either making the 2 arrays (vertex and adjacency array) some large maximum size OR reallocating new arrays and copying the contents from the old to the new.
As we've seen before, data structures that need to grow like this are sometimes better implemented with linked representations.
First, we need to store the element (the information at each vertex), so it's easy to put that in some kind of node...
a vertex node's data ------- | A | -------
We also need to represent the vertices that A is connected to (i.e., the edges from A). Since we don't know how many edges that will be, we can use a linked list to store a list of edges, as in:
a vertex node ------- | A | |-----| --------- --------- | --+---> | B | --+---> | D | 0 | ------- --------- --------- \ / edge nodes
2 points should be made:
------- | A | |-----| --------- --------- | --+---> | | | --+---> | D | 0 | ------- --+------ --------- | +--------+ | v ------- | B | |-----| <-- vertex node for B | 0 | -------
That's what we'll do, however, since that will end up making our drawings very complicated, we'll stick to drawing:
------- | A | |-----| --------- --------- | --+---> | B | --+---> | D | 0 | ------- --------- ---------
So, now for each vertex we have something like:
(Remember that each edge node's data is a pointer to a vertex node.)------- | A | |-----| --------- --------- | --+---> | B | --+---> | D | 0 | ------- --------- --------- ------- | B | |-----| | 0 | ------- ... ------- | F | |-----| --------- | --+---> | C | 0 | ------- ---------
Now, even though we can get from one particular vertex to its adjacent vertices, we still need some way to get to one vertex to begin with. Thus, we need a pointer to some initial vertex, as in:
| v ------- | A | |-----| --------- --------- | --+---> | B | --+---> | D | 0 | ------- --------- ---------
Are we done? No, unfortunately! Since the graph can have more than one component (i.e., disconnected part) AND because we are not guaranteed to be able to get to all other vertices from an arbitrary vertex in a component (e.g., from B we can't get to A), we need some other way to be able to access all vertices.
One solution is to join them into a linked list. Here's our current version of the representation:
| v ------- | A | |-----| --------- --------- | --+---> | B | --+---> | D | 0 | |-----| --------- --------- | | | ---+--- | v ------- | B | |-----| | 0 | |-----| | | | ---+--- | v ------- | C | |-----| --------- | --+---> | F | 0 | |-----| --------- | | | ---+--- | v ------- | D | |-----| --------- --------- | --+---> | E | --+---> | B | 0 | |-----| --------- --------- | | | ---+--- | v ------- | E | |-----| --------- | --+---> | B | 0 | |-----| --------- | | | ---+--- | v ------- | F | |-----| --------- | --+---> | C | 0 | |-----| --------- | 0 | -------
Another operation we want is to determine if one vertex is reachable from another. That will entail going from vertex to vertex (along edges), seeing if we eventually reach the desired vertex.
However, since our graphs can have loops, we need to make sure we don't enter an infinite cycle. Also, since there may be more than one way to get to a single vertex, we want to make sure we don't waste effort by exploring a vertex more than once. To solve both problems, we just need to make sure that we don't go to a vertex we've been to before. The easiest way to keep track of this is to add a field to each vertex that keeps track of whether it has been visited or not, as in:
------- | A | |-----| |vis? | |-----| | --+---> ... |-----| | | | ---+--- | v ...
Now we are done with the representation.
With our choice of graph implementation made, we are ready to define
the types needed in the implementation file (graph.c
).
For each vertex, we need to store an element, its visited flag, its list of edges, and a link to the next vertex. Thus, we use the following type to store a vertex:
typedef struct vertexTag { graphElementT element; int visited; struct edgeTag *edges; struct vertexTag *next; } vertexT;
Note how we use the type "struct edgeTag"
to refer to
the edge nodes, which haven't been defined just yet.
For each edge, we need a link to the vertex it connects to and a link to the next edge. Thus, we use the following type to store an edge:
typedef struct edgeTag { struct vertexTag *connectsTo; struct edgeTag *next; } edgeT;
"vertexT"
for a vertex since that
typedef
is defined above us.
Finally, we need to define what information is needed to keep track of the whole graph:
which is just a pointer to the initial vertex.typedef struct graphCDT { vertexT *vertices; } graphCDT;
graph.h graph.c ------- ------ #include "graph.h" typedef char graphElementT; typedef struct vertexTag { graphElementT element; int visited; struct edgeTag *edges; struct vertexTag *next; } vertexT; typedef struct edgeTag { struct vertexTag *connectsTo; struct edgeTag *next; } edgeT; typedef struct graphCDT typedef struct graphCDT { *graphADT; vertexT *vertices; } graphCDT;
Now that we've finished the types, let's implement one of the graph operations. The graph functions we'll need are:
We'll just implement GraphIsReachable()
.
Let's suppose we want to determine if one vertex is reachable from another...
Here are the steps for such an algorithm:
1. Set all the visited flags to false. 2. Go to the start vertex. 3. If vertex you are at has been visited, don't pursue this route again. Go back to where you came from, returning false. 4. If it is the destination we are looking for, you've found it, so return true. 5. Set its visited flag to true. 6. Go through each of the edges. Go to the vertex an edge connects to (new starting location). Go to step 3. 7. When exhausted all edges and not found it, it is not reachable from here, go back to where you came from, returning false.
Note that when we return false, it doesn't necessarily mean the destination is not reachable from the source, just from the subgraph we are working on.
Based on the algorithm, to see if E is reachable from A,
we would perform the following steps:
Step 1. Set all visited flags to false. Step 2. Go to A. Step 3. We haven't visited A yet. Step 4. A is not our destination. Step 5. Mark A as visited. Step 6. Go through A's edges. Go to B (i.e., suppose the edge to B is listed first) and jump to Step 3. Step 3. We haven't visited B yet. Step 4. B is not our destination. Step 5. Mark B as visited. Step 6. Go through B's edges. Step 7. It has no edges, so go back to A. Go to D and jump to Step 3. Step 3. We haven't visited D yet. Step 4. D is not our destination. Step 5. Mark D as visited. Step 6. Go through D's edges. Go to E (i.e., suppose the edge to E is listed first) and jump to Step 3. Step 3. We haven't visited E yet. Step 4. E is our destination. IT IS REACHABLE!
Answer: Yes! We'd do the children first, which makes it depth-first!
Note that Step 1 and 2 are done once, but we may repeat Steps 3 through 7 over and over. Also, we need to go back to where we came from if we've already visited a vertex or we exhaust its edges before getting to the destination.
Since this part of the algorithm is repetitive and involves backtracking, what technique can we use to write it?
Answer: Recursion!
The part of IsReachable we will perform recursively will be
done by a helper function, RecIsReachable()
. This
function will be initially called with the start vertex and
will recurse on other vertices. The only type that
will allow us to refer to both the start and any other vertex is
vertexT *
an ADT | ------------ +---> | vertices | a CDT | | | -----+------ | v ------- | A | |-----| --------- --------- | --+---> | | | --+---> | | | 0 | |-----| --+------ --+------ | | | | | ---+--- v v | (to vertex B) (to vertex D) v
In other words, ADT/CDT is not a candidate since it only refers to one
vertex, and not necessarily the one we want to start from. Thus,
RecIsReachable()
will have to take a vertexT
*
Now, before we discuss the implementation of the helper function, let's
consider the function GraphIsReachable()
, which users of a
graph will call. This function will need to take a graph, the
destination, and the source. Since the user does not know how the
graph is represented, it refers to the source and destination by the
elements located at the source and destination vertices,
giving a prototype of:
int GraphIsReachable(graphADT graph, graphElementT dest, graphElementT source);
The int
return value will contain a true or
false value depending on whether the destination is reachable
from the source.
Marking all the vertices as not visited requires traversing
all the vertices. Also, finding the vertex with the source
value may require searching all the vertices. So, it's easy
for GraphIsReachable()
to do those things at the same time
(and before it calls the recursive helper function to do the searching).
Here is an implementation of GraphIsReachable()
:
int GraphIsReachable(graphADT graph, graphElementT dest, graphElementT source) { vertexT *vertP; vertexT *startP = NULL; /* * Go through each vertex, mark them * as "not visited" and also record * where the start vertex is. */ for (vertP = graph->vertices; vertP != NULL; vertP = vertP->next) { vertP->visited = 0; if (vertP->element == source) startP = vertP; } /* Make sure the starting point exists. */ if (startP == NULL) return 0; /* Now see if we can get there from here. */ return RecIsReachable(dest, startP); }
Notice that this even works when the desired source isn't in the graph, including the case when the graph is empty.
Now, the work of the repetitive steps can be done recursively, e.g.:
static int RecIsReachable(graphElementT dest, vertexT *startP) { edgeT *edgeP; /* Have we been here already? */ if (startP->visited) return 0; /* * Is this the destination? If so, * we've reached it! */ if (startP->element == dest) return 1; /* Don't come here again. */ startP->visited = 1; /* * See if we can get there from each * of the vertices we connect to. * If we can get there from at least * one of them, it is reachable. */ for (edgeP = startP->edges; edgeP != NULL; edgeP = edgeP->next) { if (RecIsReachable(dest, edgeP->connectsTo)) return 1; } /* * Couldn't get there from any of our * neighbors, so it is unreachable from * here. */ return 0; }
Note that we could have passed the destination in via a vertex pointer
too, if like the start vertex, we had looked for it when looping
through each vertex in GraphIsReachable()
.