Luka Vukusic
Optimization of Reconfigurable Hardware Accelerators for Embedded Machine Learning

Abstract
This thesis delves into the synergies between neural networks and field programmable gate arrays (FPGAs) for efficient acceleration. FPGAs, renowned for their reconfigurability, offer a promising avenue for implementing neural network accelerators. The focus is on optimizing STANN, a C++ template library tailored for FPGA-based neural network implementations, by leveraging insights from state-of-the-art libraries, such as hls4ml. The optimization endeavors center around the refinement of dense layers, targeting improvements in matrix multiplication, along with activation function calculations. Encouraging outcomes include a substantial sixfold reduction in computation time of the activation function by approximating its results utilizing lookup tables. Furthermore, optimization attempts on matrix multiplication showcase a one-third reduction in latency for a neural network architecture boasting ten hidden layers, each housing 128 neurons. Through comparative analysis with leading acceleration libraries, this work unveils promising possibilities for STANN’s further optimization, underscoring the ongoing potential for refinement of FPGA-driven neural network acceleration.