2025 Volume 33 Pages 79-90
High-performance embedded systems such as autonomous driving systems, require platforms that reduce power consumption and perform high-performance processing. Multi-/many-core processors are attracting attention as they satisfy both requirements. This paper focuses on clustered many-core processors represented by Kalray MPPA. Clustered many-core processors regard multiple cores as a single cluster, with each cluster having private memory. By using separate memory for each core, communication conflicts between tasks can be reduced, and the worst-case response time can be improved. This paper describes how to apply federated scheduling, which allocates dedicated cores to high-load tasks, to clustered many-core processors. In a clustered many-core processor, each core uses separate memory, thereby eliminating memory contention between tasks. However, if a task communicates between two different clusters, the communication time increase. Studies on clustered many-core processors often assume that the local memory-capacity is sufficient for all tasks. However, the local memory is small. This paper also discusses the use of shared DDR memory. When cores are allocated across multiple clusters, the associated inter-cluster communication increases the execution time, which may change the number of cores required. Therefore, in such cases we have introduced a mechanism to recalculate the required number of cores including the communication time. We also propose a mechanism to reduce the number of cores required by analysing the tasks accessing DDR when DDR is used in cases of insufficient local memory capacity. The evaluation results indicate that clustered many-core processors greatly reduce the number of cores required for tasks with high communication burden. Tasks that require inter-cluster communication differ depending on the order in which the dedicated cores of the tasks are allocated. In a clustered many-core processor, scheduling depends on the method for allocating a dedicated core in each cluster. In addition, by utilizing DDR memory, it is possible to perform scheduling even if there is a task that exceeds the local memory capacity.