Marcel Flottmann
An Out-of-Order Vector Processing Unit for a Superscalar RISC-V Processor

Abstract
This study introduces a novel vector unit designed specifically for the RISC-V Vector extension version 1.0. Notably, it is the first architecture to incorporate full out-oforder execution, register renaming, and speculation. The majority of the components have been implemented in VHDL, and performance evaluation is conducted through a cycle-accurate simulation using QuestaSim 2021.1, with test applications developed in C++. Results indicate that the vector unit significantly reduces the dynamic instruction count and improves execution speed by a factor of 2.2x to 11.9x when utilizing two lanes, in comparison to a two-wide superscalar out-of-order core. Furthermore, the design is successfully implemented on an AMD/Xilinx Zynq UltraScale+ XCZU15EG, achieving a clock rate of 107.7 MHz and power consumption of 0.682 W. The unit exhibits a peak throughput of 0.43 INT64-GOPS. Moreover, the design is physically implemented as an ASIC using the predictive 7nm ASAP PDK with OpenRoad. The ASIC-based vector unit occupies an area of 0.276 mm² and achieves a clock rate of 412.91 MHz, resulting in a peak throughput of 1.65 INT64-GOPS.