Abstract
In this study, framework for automatic tuning on selection of optimum parallel programming model in finite-element computation has been investigated. Target FEM code has been tested and optimized on T2K Open Supercomputers (Tokyo) with up to 1,024 cores. Combination of "firsttouch data placement", "sequential reordering of data", NUMA control with "localalloc", optimization of inter-domain communications improved performance of hybrid parallel programming models, and final performance of hybrid is competitive with or rather better than that of Flat MPI programming model. Target code can be utilized as an engine of the framework for automatic tuning.