Performance and Energy Evaluation of Memory Organizations in NoC-Based MPSoCs under Latency and Task Migration [chapter]

Gustavo Girão, Daniel Barcelos, Flávio Rech Wagner
2011 IFIP Advances in Information and Communication Technology  
This chapter presents a study on the performance and energy consumption arising from distinct memory organizations in an NoC-based MPSoC environment. This evaluation considers three sets of experiments. The first one evaluates the performance and energy efficiency of four different memory organizations in a situation where a single application is executed. In the second experiment, a traffic generator is responsible for the injection of synthetic traffic into the system, simulating the impact
more » ... the parallel execution of additional applications and increasing the latency of the NoC. Results show that, with a low NoC latency, the distributed memory presents better results for applications with low amount of data to be transferred. On the other hand, results suggest that shared and distributed shared memories present the best results for applications with high data transferring needs. In the second set of experiments, with higher NoC latency, for applications with low communication bandwidth requirements, a memory organization that is physically centralized and logically shared (called nDMA) is shown to have a smooth performance degradation when additional traffic rises up to 20% of the network capacity (22% degradation for an application demanding high communication, and 34% degradation for a low communication one). In contrast, a distributed memory model presents 2% of degradation in an application with high communication requirements, when traffic rises up to 20% of the network capacity, and reaches 19% of degradation in low communication ones. Shared and distributed shared memory models are shown to present lower tolerance to high latencies. A third set of experiments evaluates the performance of the four memory organization models in a situation of task migration, when a new application is launched and its tasks must be distributed among several nodes. Results show that the shared memory and distributed shared memory models have a better performance and energy savings than the distributed memory model in this situation. In addition, the nDMA memory model presents a smaller overhead when compared to the shared memory models and tends to reduce the traffic in the migration process due to the concentration of all memory modules in a single node of the network. Nowadays, embedded systems have become very complex. This complexity has many reasons, but the most evident one is the use of such devices for general purpose computing, leading to the execution of many different and complex applications. However, even with higher performance requirements, low power design is still a very desirable goal in portable devices [1] . To support processing requirements and also meet stringent constraints in terms of area and memory, as well as low energy consumption and low power dissipation, a solution using several cores in a single chip is widely adopted. This architecture is known as Multiprocessor System-on-Chip (MPSoC). This scenario usually implies a communication bandwidth between cores that demands a more efficient communication mechanism than a single bus [2] . With this concern in mind, the concept of Network-on-Chip (NoC) has been created. Considering an MPSoC scenario, memory organization plays a key role since it is not only a major performance bottleneck but also represents a significant component in terms of energy consumption. In addition, memory organization is closely related to the communication model adopted in the application development. For instance, when using a shared memory organization, the communication mechanism usually adopted is the memory itself and, therefore, the memory organization becomes even more important. Realizing that NoCs are communication structures with high scalability, it is not hard to imagine a situation with dozens or hundreds of processing elements and memory nodes, running a large number of applications concurrently. In this scenario, it is of great interest the evaluation of the behavior of different memory organizations when the network latency increases due to the large number of components and applications in the system. In addition, the memory model also impacts the system performance when a new application is dynamically launched and a task migration mechanism is applied such that a new task allocation is found which better meets system requirements, especially real-time and energy constraints. This chapter presents a study on the performance and energy consumption arising from distinct memory organizations in an NoC-based MPSoC environment. This evaluation considers three sets of experiments, running on a virtual platform. The first one evaluates the performance and energy efficiency of four different memory organizations in a situation where a single application is executed. In the second set of experiments, a traffic generator is responsible for the injection of synthetic traffic into the system, simulating the impact of the parallel execution of additional applications and increasing the latency of the NoC. The following memory organizations have been implemented in the virtual platform and evaluated in the experiments: (i) distributed memory, where processors have their local private memories; (ii) shared memory, with a single memory component in a dedicated node on the NoC that is accessed by all processors; (iii) distributed shared memory, composed by several physically distributed memory nodes that share the same address space; and, finally, (iv) a physically shared but logically distributed memory, whose communication model resembles a DMA communication protocol and is thus called nDMA.
doi:10.1007/978-3-642-23120-9_4 fatcat:ayblacdnsbcfdkofcbe7e2s5yy