Below are a number of programming exercises.
Repeat the example of the 2x2 matrix multiplication on you own implementation.
Extend the implementation to (exclusively) support 3x3 matrices: 1) verify correct working, 2) find the latency, and 3) determine the cost (#LUT, #FF, imem-size)
SW-only |
with M extension |
Relative |
|
---|---|---|---|
# CC1 | 1887 | 285 | 15% |
T (ns) | 25 | 36 | 144% |
fclk (MHz) | 40 | 27.78 | 70% |
Latency (µs) | 47.175 | 10.260 | 22% |
imem size (bytes) | 2441 | 1908 | 78% |
# LUT | 1553 | 3501 | 225% |
# FF | 1156 | 1156 | 100% |
Througput (matrix mult/s) |
1/47175e-9 = 21k mm/s | 1/10260e-9 = 97k mm /s | 462% |
1 This counts real clock cycles. The 33% duty cycle induced by the chip enable is taken into account. The non-optimal usage of the processor is hence reflected in the outcome.