# Graph - Linked Implementation

1. Abstract idea of a graph:

A graph is yet another data structure that you can use to store information. Unlike trees, which have a strict hierarchical structure, graphs are more flexible.

Consequences of graphs being more flexible are that (unlike trees) they can can have loops and parts may be disconnected.

Here is a simple graph that stores letters:

#### Graph Vocabulary

The letters are held in what are called the vertices of the graph. Vertices can be connected to other vertices. A connection between 2 vertices is called an edge.

This example graph is a directed graph. This just means that each edge in the graph is unidirectional, i.e., it goes from one vertex to another. For example, there is an edge from D to B, but there is in no edge representing the reverse relationship (from B to D).

Also, all the vertices aren't connected in this example graph. I.e., there are connections between A, B, D and E, but there is no way to get to vertex C from any of those vertices. Thus, A, B, D and E form their own component. A second component is made up of C and F.

#### Uses

Why would we want to store information in a graph? Well, some types of information are naturally represented in a graph.

For example, the above graph could be viewed as a map of what cities are connected by train routes. Viewing it this way, each vertex represents a particular city and each edge represents whether there is a train route from one city to another. We can imagine that the edges are unidirectional since trains are only allowed to go in one direction on the tracks (a new government rule so trains don't crash into each other any more).

Since the graph has different components, there are some cities that are not connected by train routes. For example, there is no way to get from city F to city D. However, just because a city is in a certain component, doesn't mean we can get to it from another city in the component. For example, we cannot get to city D starting from city B.

Finally, the edges in our graph represent a very simple relationship, i.e., one city is connected to another. However, just as we can store information at vertices (e.g., the city name), we can also store information at each edge. For example, we might want to store the distance between cities at edges OR the time the trip takes OR the cost of the ticket OR all of those pieces of information.

To keep our example simple, however, we'll just store information at the vertices, and not at edges.

2. Graph operations:

As mentioned, graphs are pretty generic data structures in that they can be used to represent lots of things. Thus, exactly what operations we'll want for a graph will depend on what we want to do with it.

Suppose we want the following operations:

• `AddVertex:`

Adds a new vertex to the graph.

For example, suppose there is a new city we want to add to our map of train routes. `AddVertex(graph, G)` would give:

• `AddEdge:`

Adds a new directed edge to the graph.

For example, adding the city was not enough, we also need to say how the rail lines connect it to other cities. Thus, we might do an `AddEdge(graph, C, G)`, giving:

(an edge from C to G).

• `IsReachable:`

Reports whether we can get there from here.

For example, we might want to know whether we can get to city E from city A:

`IsReachable(graph, E, A)` would report a true value.

Again, we might want to know whether we can get to city D from city E:

`IsReachable(graph, D, E)` would report a false value.

There are many other operations we might want, but these will suffice for our example.

3. Graph representation in C:

Since we are providing a new data structure with several operations, we'll want to organize it into a module.

As usual, we'll use ADTs/CDTs to hide the implementation details of our data structure. Here is how the 2 files that make up the module, i.e., the interface (`graph.h`) and the implementation file (`graph.c`) will look:

```graph.h				graph.c
-------				-------
#include "graph.h"

type-of-element			other-types-needed

abstract-type-of-graph		concrete-type-of-graph

function			function
prototypes			definitions
```

Note: We'll get the types from `graph.h` in `graph.c` since we always include the header for a module in the implementation part of the module.

Again, the interface for the graph will need to have a abstract type for the graph (for people to define graph variables) and the type of an element (for functional prototypes)...

The implementation is hidden from the user and will hold the types we need to implement the internals of the graph. What other types are needed will depend on our particular implementation choices.

Now, filling in the types in the interface is easy.

We need the type of an element. To keep it simple, we can just store a letter for each city. (You can imagine storing full city names and lots of other things about each city.)

So, the type-of-an-element is just:

```typedef char graphElementT;
```

The abstract-type-of-graph is always the same, a pointer to the CDT:

```typedef struct graphCDT *graphADT;
```
The other types we need, those for the internal representation of the graph, will depend on what we actually want to do with the graph...

4. Implementation of a graph:

Instead of making a haphazard decision about what implementation to use for the graph, we should give it some thought.

The types of graphs we need (directed, with possibly more than one disconnected component) and what we want to do with them (the operations `AddVertex`, `AddEdge` and `IsReachable`) will greatly affect how we should implement the graph. In other words, if we make poor choices, it may be difficult or impossible to represent some graphs and it may be difficult or impossible to implement some operations.

The 2 things that our graph implementation will have to represent are vertices and edges.

#### Array Implementation

The first choice we might consider is an array implementation. It's easy to store each vertex in an array:
```-------------------------------
| A | B | C | D | E | F | ... |
-------------------------------
0   1   2   3   4   5   ...
```

Likewise, the edges can be stored using something called an adjacency matrix. Essentially, it represents which vertices are adjacent, or rather, which pairs of vertices have an edge. For example, the adjacency matrix for our original graph would be:

```Adjacency matrix:

A  B  C  D  E  F
+-----------------
A| 0  1  0  1  0  0
B| 0  0  0  0  0  0
C| 0  0  0  0  0  1
D| 0  1  0  0  1  0
E| 0  1  0  0  0  0
F| 0  0  1  0  0  0
```

Each position represents whether one vertex is connected to another (value `1` = true) or not (value `0` = false). Note that it encodes the direction of the edges. For example, since there is a `1` in row A, column B, there is an edge from A to B. Since there is no edge in the reverse direction (from B to A), row B, column A has a `0`.

Of course, this adjacency matrix could be represented by a 2-dimensional array.

The drawback to this approach lies in that we want to add vertices. Adding vertices would require either making the 2 arrays (vertex and adjacency array) some large maximum size OR reallocating new arrays and copying the contents from the old to the new.

Aside: As another drawback, graphs with few edges would have a lot of zeroes in the adjacency matrix, thus wasting space.

As we've seen before, data structures that need to grow like this are sometimes better implemented with linked representations.

Another way to implement a graph is to used a linked-list-like representation.

First, we need to store the element (the information at each vertex), so it's easy to put that in some kind of node...

```a vertex node's data
-------
|  A  |
-------
```

We also need to represent the vertices that A is connected to (i.e., the edges from A). Since we don't know how many edges that will be, we can use a linked list to store a list of edges, as in:

```a vertex node
-------
|  A  |
|-----|     ---------     ---------
|   --+---> | B | --+---> | D | 0 |
-------     ---------     ---------
\       /
edge nodes
```

2 points should be made:

• For vertices, we need to store the element at that vertex. In contrast, for an edge, we just need to store a reference to the vertex it is connected to (we don't need all the information about that vertex). Thus, vertex nodes and edge nodes are different things.

• In the above edge that goes to B for example, since we don't want to replicate all the information stored about B in this edge node, that reference to B should really be a pointer to the vertex node with B, as in:
```-------
|  A  |
|-----|     ---------     ---------
|   --+---> | | | --+---> | D | 0 |
-------     --+------     ---------
|
+--------+
|
v
-------
|  B  |
|-----| <-- vertex node for B
|  0  |
-------
```

That's what we'll do, however, since that will end up making our drawings very complicated, we'll stick to drawing:

```-------
|  A  |
|-----|     ---------     ---------
|   --+---> | B | --+---> | D | 0 |
-------     ---------     ---------
```

So, now for each vertex we have something like:

```-------
|  A  |
|-----|     ---------     ---------
|   --+---> | B | --+---> | D | 0 |
-------     ---------     ---------

-------
|  B  |
|-----|
|  0  |
-------

...

-------
|  F  |
|-----|     ---------
|   --+---> | C | 0 |
-------     ---------
```
(Remember that each edge node's data is a pointer to a vertex node.)

Now, even though we can get from one particular vertex to its adjacent vertices, we still need some way to get to one vertex to begin with. Thus, we need a pointer to some initial vertex, as in:

```   |
v
-------
|  A  |
|-----|     ---------     ---------
|   --+---> | B | --+---> | D | 0 |
-------     ---------     ---------
```

Are we done? No, unfortunately! Since the graph can have more than one component (i.e., disconnected part) AND because we are not guaranteed to be able to get to all other vertices from an arbitrary vertex in a component (e.g., from B we can't get to A), we need some other way to be able to access all vertices.

One solution is to join them into a linked list. Here's our current version of the representation:

```   |
v
-------
|  A  |
|-----|     ---------     ---------
|   --+---> | B | --+---> | D | 0 |
|-----|     ---------     ---------
|  |  |
---+---
|
v
-------
|  B  |
|-----|
|  0  |
|-----|
|  |  |
---+---
|
v
-------
|  C  |
|-----|     ---------
|   --+---> | F | 0 |
|-----|     ---------
|  |  |
---+---
|
v
-------
|  D  |
|-----|     ---------     ---------
|   --+---> | E | --+---> | B | 0 |
|-----|     ---------     ---------
|  |  |
---+---
|
v
-------
|  E  |
|-----|     ---------
|   --+---> | B | 0 |
|-----|     ---------
|  |  |
---+---
|
v
-------
|  F  |
|-----|     ---------
|   --+---> | C | 0 |
|-----|     ---------
|  0  |
-------
```

#### Effects of Operations on Implementation

We have already seen how the operations we want to perform affect our implementation. For example, because we want to be able to add vertices easily, we rejected the array implementation.

Another operation we want is to determine if one vertex is reachable from another. That will entail going from vertex to vertex (along edges), seeing if we eventually reach the desired vertex.

However, since our graphs can have loops, we need to make sure we don't enter an infinite cycle. Also, since there may be more than one way to get to a single vertex, we want to make sure we don't waste effort by exploring a vertex more than once. To solve both problems, we just need to make sure that we don't go to a vertex we've been to before. The easiest way to keep track of this is to add a field to each vertex that keeps track of whether it has been visited or not, as in:

```-------
|  A  |
|-----|
|vis? |
|-----|
|   --+---> ...
|-----|
|  |  |
---+---
|
v
...
```

Now we are done with the representation.

5. Data types for implementation:

With our choice of graph implementation made, we are ready to define the types needed in the implementation file (`graph.c`).

For each vertex, we need to store an element, its visited flag, its list of edges, and a link to the next vertex. Thus, we use the following type to store a vertex:

```typedef struct vertexTag {
graphElementT element;
int visited;
struct edgeTag *edges;
struct vertexTag *next;
} vertexT;
```

Note how we use the type `"struct edgeTag"` to refer to the edge nodes, which haven't been defined just yet.

For each edge, we need a link to the vertex it connects to and a link to the next edge. Thus, we use the following type to store an edge:

```typedef struct edgeTag {
struct vertexTag *connectsTo;
struct edgeTag *next;
} edgeT;
```

Aside: We could have just used `"vertexT"` for a vertex since that `typedef` is defined above us.

Finally, we need to define what information is needed to keep track of the whole graph:

```typedef struct graphCDT {
vertexT *vertices;
} graphCDT;
```
which is just a pointer to the initial vertex.

So, the set of types we need for the graph module is organized as follows:
```graph.h				graph.c
-------				------
#include "graph.h"

typedef char graphElementT;	typedef struct vertexTag {
graphElementT element;
int visited;
struct edgeTag *edges;
struct vertexTag *next;
} vertexT;

typedef struct edgeTag {
struct vertexTag *connectsTo;
struct edgeTag *next;
} edgeT;

typedef struct graphCDT		typedef struct graphCDT {
} graphCDT;
```

6. Graph functions:

Now that we've finished the types, let's implement one of the graph operations. The graph functions we'll need are:

For general graph operations:
• GraphIsReachable()

Because we are programming in C (setup/cleanup):
• GraphCreate()
• GraphDestroy()

We'll just implement `GraphIsReachable()`.

7. IsReachable algorithm:

Let's suppose we want to determine if one vertex is reachable from another...

Here are the steps for such an algorithm:

```1. Set all the visited flags to false.
2. Go to the start vertex.
3. If vertex you are at has been visited,
don't pursue this route again.  Go back
to where you came from, returning false.
4. If it is the destination we are looking for,
you've found it, so return true.
5. Set its visited flag to true.
6. Go through each of the edges.

Go to the vertex an edge connects to
(new starting location).
Go to step 3.

7. When exhausted all edges and not found it,
it is not reachable from here, go back to
where you came from, returning false.
```

Note that when we return false, it doesn't necessarily mean the destination is not reachable from the source, just from the subgraph we are working on.

Based on the algorithm, to see if E is reachable from A,

we would perform the following steps:
```Step 1. Set all visited flags to false.
Step 2. Go to A.
Step 3. We haven't visited A yet.
Step 4. A is not our destination.
Step 5. Mark A as visited.
Step 6. Go through A's edges.
Go to B (i.e., suppose the edge to B
is listed first) and jump to Step 3.
Step 3. We haven't visited B yet.
Step 4. B is not our destination.
Step 5. Mark B as visited.
Step 6. Go through B's edges.
Step 7. It has no edges,
so go back to A.
Go to D and jump to Step 3.
Step 3. We haven't visited D yet.
Step 4. D is not our destination.
Step 5. Mark D as visited.
Step 6. Go through D's edges.
Go to E (i.e., suppose the edge to E
is listed first) and jump to Step 3.
Step 3. We haven't visited E yet.
Step 4. E is our destination.
IT IS REACHABLE!
```

Aside: Is this traversal of the graph depth-first or breadth-first? Well, suppose B had edges... Would we do the vertices B is connected to (its children) before doing D (B's sibling under A)?

Answer: Yes! We'd do the children first, which makes it depth-first!

Note that Step 1 and 2 are done once, but we may repeat Steps 3 through 7 over and over. Also, we need to go back to where we came from if we've already visited a vertex or we exhaust its edges before getting to the destination.

Since this part of the algorithm is repetitive and involves backtracking, what technique can we use to write it?

8. IsReachable function:

The part of IsReachable we will perform recursively will be done by a helper function, `RecIsReachable()`. This function will be initially called with the start vertex and will recurse on other vertices. The only type that will allow us to refer to both the start and any other vertex is `vertexT *`.

```an ADT
|     ------------
+---> | vertices | a CDT
|    |     |
-----+------
|
v
-------
|  A  |
|-----|     ---------     ---------
|   --+---> | | | --+---> | | | 0 |
|-----|     --+------     --+------
|  |  |       |             |
---+---       v             v
|       (to vertex B)  (to vertex D)
v
```

In other words, ADT/CDT is not a candidate since it only refers to one vertex, and not necessarily the one we want to start from. Thus, `RecIsReachable()` will have to take a ```vertexT *```.

Now, before we discuss the implementation of the helper function, let's consider the function `GraphIsReachable()`, which users of a graph will call. This function will need to take a graph, the destination, and the source. Since the user does not know how the graph is represented, it refers to the source and destination by the elements located at the source and destination vertices, giving a prototype of:

```int GraphIsReachable(graphADT graph,
graphElementT dest,
graphElementT source);
```

The `int` return value will contain a true or false value depending on whether the destination is reachable from the source.

Marking all the vertices as not visited requires traversing all the vertices. Also, finding the vertex with the source value may require searching all the vertices. So, it's easy for `GraphIsReachable()` to do those things at the same time (and before it calls the recursive helper function to do the searching).

Here is an implementation of `GraphIsReachable()`:

```int GraphIsReachable(graphADT graph,
graphElementT dest,
graphElementT source)
{
vertexT *vertP;
vertexT *startP = NULL;

/*
* Go through each vertex, mark them
* as "not visited" and also record
* where the start vertex is.
*/

for (vertP = graph->vertices;
vertP != NULL;
vertP = vertP->next) {
vertP->visited = 0;
if (vertP->element == source)
startP = vertP;
}

/* Make sure the starting point exists. */

if (startP == NULL)
return 0;

/* Now see if we can get there from here. */

return RecIsReachable(dest, startP);
}
```

Notice that this even works when the desired source isn't in the graph, including the case when the graph is empty.

Now, the work of the repetitive steps can be done recursively, e.g.:

```static int RecIsReachable(graphElementT dest,
vertexT *startP)
{
edgeT *edgeP;

/* Have we been here already? */

if (startP->visited)
return 0;

/*
* Is this the destination?  If so,
* we've reached it!
*/

if (startP->element == dest)
return 1;

/* Don't come here again. */

startP->visited = 1;

/*
* See if we can get there from each
* of the vertices we connect to.
* If we can get there from at least
* one of them, it is reachable.
*/

for (edgeP = startP->edges;
edgeP != NULL;
edgeP = edgeP->next) {
if (RecIsReachable(dest, edgeP->connectsTo))
return 1;
}

/*
* Couldn't get there from any of our
* neighbors, so it is unreachable from
* here.
*/

return 0;
}
```

Note that we could have passed the destination in via a vertex pointer too, if like the start vertex, we had looked for it when looping through each vertex in `GraphIsReachable()`.

BU CAS CS - Graph - Linked Implementation