To enable efficient NN inference on the edge, embedded Neural Processing Units (NPUs) are becoming common in MCUs and SoCs: an example is STMicroelectronics experimental NPU and its associated compilation toolchain. This mapping tool exposes various compile-time and design-time parameters that can be configured, such as the nodes' decomposition strategy or the number of the NPU's computing units. Finding the optimal strategy to efficiently map a NN on the target hardware accelerator is a challenging task: given the vastness of this design space, automatic optimization techniques are needed. In this work, we integrated STMicroelectronics NPU compilation toolchain with an automatic exploration framework (MOST), and we compared several Design Space Exploration (DSE) techniques such as Simulated Annealing, Greedy Search and Genetic Algorithms to identify the most suitable methodology to drive the autotuning of the NPU compiler co-design parameters. We then applied these optimizations to select the best set of parameters to run the inference of the Tiny-Yolo DCNN using ST NPU.