The widespread use of AI-based techniques for applications in very different scenarios and device domains, ranging from the edge to the cloud, has led to the introduction of Deep Neural Networks (DNNs) hardware accelerators in the form Domain-Specific Architectures (DSAs). The design and implementation of such hardware accelerators have been investigated in recent literature, and promising solutions coming from Approximate Computing (AC) paradigm can be applied to enhance energy and performance figures, given the forgiving nature of DNNs.
Approximate multipliers represent an excellent example with their inexact but energy-efficient design, mainly due to the vast amount of multiply-and-accumulate (MAC) operations required by DNNs.
This work aims to present and evaluate the application of AC techniques to the computation, communication, and memory subsystems that compose a hardware accelerator. It focuses on the performance vs. energy vs. accuracy trade-offs for what concerns the inference phase, which is of great interest as it is generally performed also on resource-constrained devices.