SIMD matrix multiplication on the MasPar
We all know that to compute the product of two matrices, we do:
C(i,j) = A(i,1)*B(1,j) + A(i,2)*B(2,j) + ... + A(i,N)*B(N,j)To do this on the MasPar, storing one element per processor, is tricky because we have to get the appropriate elements to the appropriate processors. Doing so using general communication might prove to be overly expensive.
Another way of doing matrix multiplication is using Pipelining. Let (+) and (*) denote element-wise addition and multiplication of two arrays. To explain this technique, consider the following 3x3 matrix multiplication of A and B to yield C:
/ \ / \ / \ | C11 C12 C13 | | A11 A12 A13 | | B11 B22 B33 | | C21 C22 C23 | = | A22 A23 A21 | (*) | B21 B32 B13 | (+) | C31 C32 C33 | | A33 A31 A32 | | B31 B12 B23 | \ / \ / \ / / \ / \ | A12 A13 A11 | | B21 B32 B13 | | A23 A21 A22 | (*) | B31 B12 B23 | (+) | A31 A32 A33 | | B11 B22 B33 | \ / \ / / \ / \ | A13 A11 A12 | | B31 B12 B23 | | A21 A22 A23 | (*) | B11 B22 B33 | | A32 A33 A31 | | B21 B32 B13 | \ / \ /In the above computation, the elements of the 3 by 3 arrays A and B were initially arranged so that successive circular shifts (shifting up for A and left for B) resulted in the elements of A and B be aligned at each processing element (PE) so that the cumulative sum for the matrix multiplication can be computed in C, which remains static with respect to the PEs. Assuming that A and B are square matrices that conform to the physical dimensions of the MasPar, answer the following questions:
Date of last update: November 22, 1994.