High-performance embedded processors are frequently designed as arrays of small, in-order scalar cores, even when their workloads exhibit high degrees of data-level parallelism (DLP). We show that these multiple instruction, multiple data (MIMD) systems can be made more efficient by instead directly exploiting DLP using a modern vector architecture. In our study, we compare arrays of scalar cores to vector machines of comparable silicon area and power consumption. Since vectors provide greater performance across the board - in some cases even with better programmability – we believe that embedded system designers should increasingly pursue vector architectures for machines at this scale.
Authors: Krste Asanovic, Daniel Dabbelt, Colin Schmidt, Eric Love, Howard Mao, Sagar Karandikar