Lecture 10 10/13/1994 Mapping UNITY to Particular Architectures ----------------------------------------- Refining UNITY programs from their highest level description to a level that resembles an underlying architecture involves applying a mapping. Such mappings can be defined for any particular architecture. In this lecture, we will go "informally" through this process using an example. The All-Points Shortest-Path Problem ------------------------------------ Given a labeled directed graph G find the shortest "distance" between every two vertices. The following UNITY program does the trick! Program P1 initially < || i,j :: d[i][j] = W[i][j] > assign < [] i,j,k :: d[i][j] = min( d[i][j], d[i][k] + d[k][j]) > end Mapping to a sequential architecture ------------------------------------ In this architecture there is one processor and one memory. Only one (non-quantified) statement can be executed at any one step. Thus, a mapping of a UNITY program to a sequential computer necessitates finding an appropriate "fair" control flow. We discuss 2 possibilities: - Pick an i, j, k at random! - (Floyd-Warchall) Loop through i followed by j followed by k as follows: ==> O(N^3) Program P2 initially x,u,v = 0,0,0 [] i,j = 0,0 [] d[i][j] = W[i][j] || i,j = i + (j+1) / N , (j+1) % N if i < N assign d[u,v] = min(d[u][v], d[u][x] + d[x][v]) || (x,u,v) = (x,u,v) + 1 if (x,u,v) != (N-1,N-1,N-1) end Mapping to synchronous architecture ----------------------------------- This architecture consists of a fixed number of processors/memory pairs. Associated with each memory is the (sub)set of processors that can read/write to it. We will focus on the SIMD architecture with a "read-only" schema. In other words, data can be read from anywhere but written only by the local processor. All the processors operate using the same clock. To map the shortest path to this architecture (assuming N processors) we have to: (1) Allocate each statement in the program to a processor. For a SIMD architecture the same statement will be executed on all processors in lock steps. (2) Allocate each variable in the program to a memory. With N processors we may allocate A[i][0 .. N-1] to processor i. (3) Specify a "fair" ordering of statement execution. - Pick i, k at random... ==> O(N^2) Program P3 initially <[] i :: <|| j :: d[i][j] = W[i][j] > > assign <[] i,k :: <|| j :: d[i][j] = min (d[i][j], d[i][k] + d[k][j]) > end - Sequence through i and k... ==> O(N^2) Program P4 initially i = 0 [] l = 0 [] <|| j :: d[l][j] = W[l][j] > || l = l+1 if l < N-1 assign <|| j :: d[i][j] = min (d[i][j], d[i][k] + d[k][j]) > || (k,i) = (k,i) + 1 if (i,k) != (N-1,N-1) end To map the shortest path to this architecture (assuming N^2 processors) we have to: (1) As before. (2) As before. With N^2 processors we allocate A[i][j] to processor ij (3) As before. - Pick i, k at random... ==> O(N) Program P5 initially <|| i,j :: d[i][j] = W[i][j] > assign <[] k :: <|| i,j :: d[i][j] = min (d[i][j], d[i][k] + d[k][j]) > end To map the shortest path to this architecture (assuming N^3 processors) we have to: (1) As before. (2) As before. With N^3 processors we allocate A[i][j] to processor ij0 (3) As before. - Do a reduction over k using the extra N processors! ==> O(log^2 N) Program P7 initially <|| i,j :: d[i][j] = W[i][j] > assign <|| i,j :: d[i][j] = > end Another Example --------------- Given a graph G=(V,E), compute a boolean vector r[1..V] such that r[i] = 1 if vertex i is reachable from vertex 1 (the initial vertex). Program initially < || v : v is a Vertex :: r[v] = 0 if v != 1 ~ r[v] = 1 if v == 1 > assign < [] u, v : (u,v) is an edge :: r[v] = r[u] || r[v] > end The above program is an O(|V|). The above program can be modified to compute the minimum number of edges to be traversed to reach a vertex i from vertex 1 as follows: Program initially < || v : v is a Vertex :: r[v],d[v] = 0,infinity if v != 1 ~ r[v],d[v] = 1,0 if v == 1 > assign < [] u, v : (u,v) is an edge :: r[v] = r[u] || r[v] || d[v] = min(d[v],d[u]+1 > end