-
Article type: Cover
2010 Volume 20 Issue 4 Pages
Cover1-
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Article type: Index
2010 Volume 20 Issue 4 Pages
Toc1-
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Article type: Index
2010 Volume 20 Issue 4 Pages
Toc2-
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Hiroshi Akiba
Article type: Article
2010 Volume 20 Issue 4 Pages
273-
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Article type: Appendix
2010 Volume 20 Issue 4 Pages
274-278
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Daisuke Takahashi
Article type: Article
2010 Volume 20 Issue 4 Pages
279-286
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
This paper presents an automatic performance tuning for parallel fast Fourier transforms (FFTs) on massively parallel platforms with multi-core processors. A blocking algorithm for parallel FFTs utilizes cache memory effectively. Since the optimal block size may depend on the problem size, we propose a method to determine the optimal block size that minimizes the number of cache misses. In addition, parallel FFTs require intensive all-to-all communication, which affects the performance of FFTs. An automatic tuning of all-to-all communication is also implemented. The performance results demonstrate that the proposed implementation of parallel FFTs with automatic performance tuning is efficient for improving the performance.
View full abstract
-
Takao Sakurai, Ken Naono, Mitsuyoshi Igai
Article type: Article
2010 Volume 20 Issue 4 Pages
287-296
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
Recently, the IDR(s) method has been emerged as a high performance iterative solver. However, the method occasionally outputs incorrect solutions by large restart frequency. To alleviate the problem, we propose a run-time auto-tuning type IDR(s) method, named "Auto-corrected IDR(s) method" (AC-IDR(s)). To avoid the incorrectness from the approximation of the original IDR(s), AC-IDR(s) predicts the occurrence of the incorrectness using the residual norm statistics and automatically replaces the approximation for the direct matrix vector multiplication. Numerical experiments show that the AC-IDR(s) solutions avoid the incorrectness in all cases.
View full abstract
-
Takahiro Katagiri, Takao Sakurai, Hisayasu Kuroda, Ken Naono, Kengo Na ...
Article type: Article
2010 Volume 20 Issue 4 Pages
297-309
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
Conventional numerical libraries with auto-tuning (AT) facility have no re-usability for their AT functions. To solve the re-usability problem of AT libraries, we have established Application Programming Interfaces (APIs) for AT. The library of the APIs is named OpenATLib. In addition, we have developed sparse iterative solvers using OpenATLib. The package is named Xabclib. We have also developed a new AT function, which is called "Numerical Policy" to specify optimization policy for AT from end-users. In the sparse matrix-vector multiplication part, we propose and implement a new parallelization method on thread processing level, named normalized non-zero elements method. The method balances the number of non-zero elements on rows, while the conventional method simply divides rows of the matrix. Due to the simple row division, the conventional method has load imbalance problem in some cases. The performance evaluation using one node (16 cores) of the T2K Open Supercomputer (U. Tokyo), we have established keeping good accuracy from end-user requirement for eigensolver and linear equations solver, respectively. In addition, we have obtained the factor of 2.8x speedup in a matrix compared to the execution of conventional method for sparse matrix-vector multiplication.
View full abstract
-
Kengo Nakajima
Article type: Article
2010 Volume 20 Issue 4 Pages
310-320
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
In this study, framework for automatic tuning on selection of optimum parallel programming model in finite-element computation has been investigated. Target FEM code has been tested and optimized on T2K Open Supercomputers (Tokyo) with up to 1,024 cores. Combination of "firsttouch data placement", "sequential reordering of data", NUMA control with "localalloc", optimization of inter-domain communications improved performance of hybrid parallel programming models, and final performance of hybrid is competitive with or rather better than that of Flat MPI programming model. Target code can be utilized as an engine of the framework for automatic tuning.
View full abstract
-
Ryuichi Shimizu
Article type: Article
2010 Volume 20 Issue 4 Pages
321-324
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Tetsutaro Kobayashi, Go Yamamoto
Article type: Article
2010 Volume 20 Issue 4 Pages
325-328
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Takumi Washio
Article type: Article
2010 Volume 20 Issue 4 Pages
329-337
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Genki Yagawa
Article type: Article
2010 Volume 20 Issue 4 Pages
338-341
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Hideyuki Suzuki, Kazuyuki Aihara
Article type: Article
2010 Volume 20 Issue 4 Pages
342-344
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Takahiro Katagiri, Satoshi Ohshima
Article type: Article
2010 Volume 20 Issue 4 Pages
345-346
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Ichiro Hagiwara
Article type: Article
2010 Volume 20 Issue 4 Pages
347-348
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Kazufumi Ozawa
Article type: Article
2010 Volume 20 Issue 4 Pages
349-350
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Toshikazu Takada, Masaharu Taniguchi
Article type: Article
2010 Volume 20 Issue 4 Pages
350-351
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Article type: Appendix
2010 Volume 20 Issue 4 Pages
352-353
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Article type: Appendix
2010 Volume 20 Issue 4 Pages
353-354
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Article type: Appendix
2010 Volume 20 Issue 4 Pages
354-
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Article type: Index
2010 Volume 20 Issue 4 Pages
355-357
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Article type: Index
2010 Volume 20 Issue 4 Pages
358-
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Article type: Appendix
2010 Volume 20 Issue 4 Pages
359-
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Article type: Appendix
2010 Volume 20 Issue 4 Pages
360-
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Article type: Appendix
2010 Volume 20 Issue 4 Pages
App1-
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS
-
Article type: Cover
2010 Volume 20 Issue 4 Pages
Cover2-
Published: December 24, 2010
Released on J-STAGE: April 08, 2017
JOURNAL
FREE ACCESS