Presentation title
An Optimized Task-Based Programming Model for Embedded Many-core Computing PlatformsAuthors
Giuseppe Tagliavini, Andrea Marongiu and Luca BeniniInstitution(s)
University of BolognaPresentation type
Technical presentationAbstract
Nowadays multi- and many-core computing platforms are widely adopted as a viable solution to accelerate compute-intensive workloads in different computing domains. However the adoption of these devices highly complicates application development, whereas it is widely acknowledged that software development is a critical activity for the platform design.
In this technical presentation we discuss the use of OpenMP tasking as a general-purpose programming model to support the execution of diverse workloads targeting embedded many-core computing platforms. We introduce a set of runtime-level techniques to support fine-grain tasks on many-core accelerators. We also provide support to "untied" tasks, which are work units that can be resumed by any available thread, thus significantly increasing the potential for parallelism exploitation. On top of this extended runtime, we implement support for work-first-scheduling (WFS) and associated cutoff policies.
Experimental evidence assesses the benefits of our runtime on three main aspects: (i) our solution can achieve the maximum speed-up with an average task granularity of 7500 cycles, while previous approaches require about 100000 cycles to achieve the same performance level; (ii) WFS enables significantly higher speedups (up to 60%) when untied tasks are used in recursive patterns; (iii) cutoff policies on top ofthe provided support for untied tasks allow to achieve nearly-ideal speedups for recursive patterns around 5K cycles. These features enable the adoption of OpenMP tasking in embedded runtime environments, including state-of-the-art applications in the time-critical domain.