Now that the design is taking some size, it is time to start looking at the cost. You might have touched upon this earlier in this program. The triangle that is shown here allows you to pick maximum 2 things to focus on.
Generate the results of the implementation at this point. How many resources does the design require? What memory size is required? These metrics have been covered in the previous chapter.
From the waveform above, it can be learned that a 2x2 matrix multiplication takes 0x1b6 clock cycles. From the implementation results it is clear that a clock cycle takes 25 ns.
A simple calculation learns that one 2x2 matrix multiplication has a latency of: 0x1b6 * 25 ns = 438 * 25 ns = 10'950 ns.