詳細検索結果

Efficient Algorithms for Stream Compaction on GPUs

Darius Bakunas-Milanowski, Vernon Rego, Janche Sang, Yu Chansu

International Journal of Networking and Computing
2017年 7 巻 2 号 208-226
発行日: 2017年
公開日: 2017/10/04

DOI https://doi.org/10.15803/ijnc.7.2_208

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

Stream compaction, also known as stream filtering or selection, produces a smaller output array which contains the indices of the only wanted elements from the input array for further processing. With the tremendous amount of data elements to be filtered, the performance of selection is of great concern. Recently, modern Graphics Processing Units (GPUs) have been increasingly used to accelerate the execution of massively large, data parallel applications. In this paper, we designed and implemented two new algorithms for stream compaction on GPU. The first algorithm, which can preserve the relative order of the input elements, uses a multi-level prefix-sum approach. The second algorithm, which is non-order-preserving, is based the hybrid use of the prefix-sum and the atomics approaches. We compared their performance with other parallel selection algorithms on the current generation of NVIDIA GPUs. The experimental results show that both algorithms run faster than Thrust, an open-source parallel algorithms library. Furthermore, the hybrid method performs the best among all existing selection algorithms on GPU and can be two orders of magnitude faster than the sequential selection on CPU, especially when the data size is large.
抄録全体を表示

PDF形式でダウンロード (871K)
Performance Evaluation of Parallel Sortings on the Supercomputer Fugaku

Tomoyuki Tokuue, Tomoaki Ishiyama

Journal of Information Processing
2023年 31 巻 452-458
発行日: 2023年
公開日: 2023/08/15

DOI https://doi.org/10.2197/ipsjjip.31.452

ジャーナルフリー

抄録を表示する抄録を非表示にする

Sorting is one of the most basic algorithms, and developing highly parallel sorting programs is becoming increasingly important in high-performance computing because the number of CPU cores per node in modern supercomputers tends to increase. In this study, we have implemented two multi-threaded sorting algorithms based on samplesort and compared their performance on the supercomputer Fugaku. The first algorithm divides an input sequence into multiple blocks, sorts each block, and then selects pivots by sampling from each block at regular intervals. Each block is then partitioned using the pivots, and partitions in different blocks are merged into a single sorted sequence. The second algorithm differs from the first one in only selecting pivots, where the binary search is used to select pivots such that the number of elements in each partition is equal. We compare the performance of the two algorithms with different sequential sorting and multiway merging algorithms. We demonstrate that the second algorithm with BlockQuicksort (a quicksort accelerated by reducing conditional branches) for sequential sorting and the selection tree for merging shows consistently high speed and high parallel efficiency for various input data types and data sizes.

抄録全体を表示

PDF形式でダウンロード (749K)
A Storage-Efficient Suffix Tree Construction Algorithm for Human Genome Sequences

Woong-Kee LOH, Heejune AHN

IEICE Transactions on Information and Systems
2011年 E94.D 巻 12 号 2557-2560
発行日: 2011/12/01
公開日: 2011/12/01

DOI https://doi.org/10.1587/transinf.E94.D.2557

ジャーナルフリー

抄録を表示する抄録を非表示にする

The suffix tree is one of most widely adopted indexes in the application of genome sequence alignment. Although it supports very fast alignment, it has a couple of shortcomings, such as a very long construction time and a very large volume size. Loh et al. [7] proposed a suffix tree construction algorithm with dramatically improved performance; however, the size still remains as a challenging problem. We propose an algorithm by extending the one by Loh et al. to reduce the suffix tree size. As a result of our experiments, our algorithm constructed a suffix tree of approximately 60% of the size within almost the same time period.
抄録全体を表示

PDF形式でダウンロード (405K)
RaPiDS: An Algorithm for Rapid Expression Profile Database Search

Paul B. Horton, Larisa Kiseleva, Wataru Fujibuchi

Genome Informatics
2006年 17 巻 2 号 67-76
発行日: 2006年
公開日: 2011/07/11

DOI https://doi.org/10.11234/gi1990.17.2_67

ジャーナルフリー

抄録を表示する抄録を非表示にする

In this paper we present a fast algorithm and implementation for computing the Spearman rank correlation (SRC) between a query expression profile and each expression profile in a database of profiles. The algorithm is linear in the size of the profile database with a very small constant factor. It is designed to efficiently handle multiple profile platforms and missing values. We show that our specialized algorithm and C++ implementation can achieve an approximately 100-fold speed-up over a reasonable baseline implementation using Perl hash tables.
RaPiDS is designed for general similarity search rather than classification - but in order to attempt to classify the usefulness of SRC as a similarity measure we investigate the usefulness of this program as a classifier for classifying normal human cell types based on gene expression. Specifically we use the k nearest neighbor classifier with a t statistic derived from SRC as the similarity measure for profile pairs. We estimate the accuracy using a jackknife test on the microarray data with manually checked cell type annotation. Preliminary results suggest the measure is useful (64% accuracy on 1, 685 profiles vs. the majority class classifier's 17.5%) for profiles measured under similar conditions (same laboratory and chip platform); but requires improvement when comparing profiles from different experimental series.
抄録全体を表示

PDF形式でダウンロード (1030K)
Medical-Dental Collaboration and Interprofessional Training for Maxillofacial Diseases at Tohoku University Hospital

Shigeto Koyama, Kuniyuki Izumita, Naoko Sato, Ryo Tagaino, Takanori Hatakeyama, Naru Shiraishi, Nobuhiro Yoda, Kaoru Igarashi, Tetsu Takahashi, Keiichi Sasaki

The Tohoku Journal of Experimental Medicine
2022年 256 巻 3 号 225-234
発行日: 2022年
公開日: 2022/03/19

DOI https://doi.org/10.1620/tjem.256.225

ジャーナルオープンアクセス HTML

抄録を表示する抄録を非表示にする

The Tohoku University Hospital has been a clinical and research facility for all the related departments of Tohoku University. Medical-dental and interprofessional collaboration has resulted in special treatment teams, made up of members of departments such as the center for head and neck cancer, the center for dysphagia, and the cleft lip and palate center. Those treatment teams held conferences, case study meetings, reading sessions, and in-hospital seminars. The purpose of this study was to evaluate the outcomes of various medical-dental and the interprofessional collaboration at Tohoku University Hospital and training program to equip hospital dentists in higher medical institutions. The attainment targets are the acquisition of basic medical skills and knowledge under the guidance of supervising doctors. As a result, the hospital dentists could acquire their own specialized knowledge and skills certificated by each academic society. The smooth team treatment has been achieved, and the number of cases discussed by cancer boards and center for dysphagia has increased year by year due to the efficiency of their clinical pathways. On the dental care side as well, the wearing rates of maxillofacial prosthetic devices such as maxillofacial prostheses and palatal augmentation prostheses (PAP) have improved, which have contributed to improving patient’s stomatognathic function. Tohoku University Hospital has been practicing collaboration between medical and dental professionals and it has produced mutual benefits. Our interprofessional training system based on the medical-dental collaboration could develop professionals who have acquired cross-disciplinary knowledge and skills from experienced doctors.

抄録全体を表示

PDF形式でダウンロード (1605K) HTML形式で全画面表示
Methods for fluid dynamics simulations of human fetal cardiac chambers based on patient-specific 4D ultrasound scans

Hadi WIPUTRA, Guat Ling LIM, Dawn Ah Kiow CHIA, Citra Nurfarah Zaini MATTAR, Arijit BISWAS, Choon Hwai YAP

Journal of Biomechanical Science and Engineering
2016年 11 巻 2 号 15-00608
発行日: 2016年
公開日: 2016/06/24
[早期公開] 公開日: 2016/03/22

DOI https://doi.org/10.1299/jbse.15-00608

ジャーナルフリー

抄録を表示する抄録を非表示にする

Previous studies provided evidence that the mechanical forces of blood flow in embryos and fetuses may play a role in causing congenital cardiovascular. It is thus important to understand the fluid mechanical forces in the human fetuses. In the current study, we present a new technique for performing computational fluid dynamics of the cardiac chambers, based on patient-specific clinical ultrasound scans of human fetuses. Ultrasound images were acquired using the Spatio-Temporal Image Correlation (STIC) mode. The images were segmented for the right ventricle blood space at various time points. A mathematical model of ventricular wall motion was developed and used to define mesh motion for computational fluid dynamics simulation of fluid within the ventricle. The ventricular mesh models created by the mathematical model was shown to satisfactorily agree with the ventricular geometries segmented from ultrasound images. Fluid dynamics simulations successfully provided details of spatial gradients of pressures, ventricular wall shear stresses, and vorticity dynamics in the ventricle. Results showed that the right ventricle diastolic flows featured two prominent vortex rings, which were sustained until systole, when part of the vorticity structures were ejected through the pulmonary outflow tract. Diastolic wall shear stress was in the range of 0.4-1.2 Pa, while systolic shear stress elevated near to the outflow tract at 1.5-3.9 Pa. In conclusion, We have established methodologies for performing patient-specific simulations of the fluid mechanics in the heart chambers of human fetuses, based on clinical ultrasound scans, and demonstrated its feasibility on a 20 weeks human fetus right ventricle.
抄録全体を表示

PDF形式でダウンロード (2078K)
Comparison of fracture resistance between cast, CAD/CAM milling, and direct metal laser sintering metal post systems

Journal of Prosthodontic Research
2016年 60 巻 1 号 23-28
発行日: 2016/01/25
公開日: 2016/03/31

DOI https://doi.org/10.1016/j.jpor.2015.08.001

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

Purpose: The purpose of this study was to compare the fracture resistance of Co-Cr post- cores fabricated with 3 different techniques: traditional casting (TC), computer-aided design and manufacturing (CAD/CAM) milling (CCM) and direct metal laser sintering (DMLS). Methods: Forty intact human mandibular premolar were endodontically treated. The roots were then randomly divided into four groups according to the post systems: the control group was only filled with gutta percha. Co-Cr metal posts were fabricated with TC, CCM and DMLS in the other three groups. The posts were luted with a resin cement and subjected to compression test at a crosshead speed of 1 mm/min. The statistical analysis of the data was performed using one-way analysis of variance (ANOVA) and multiple comparison post hoc Tukey tests (a = .05). The samples were examined under a stereomicroscope with ?20 magnification for the evaluation of the fracture types. Results: The mean fracture loads were 432.69 N for control, 608.89 N for TC, 689.40 N for DMLS and 959.26 N for CCM. One-way ANOVA revealed significant difference between the groups ( p < 0.01). In the post hoc Tukey test, there were significant differences between groups except DMLS and TC. Conclusion: While Co-Cr posts fabricated by TC and DMLS systems performed similarly in terms of fracture resistance, posts fabricated by CCM techniques showed higher fracture resistance values. Significance: Co-Cr metal posts fabricated by CCM and DMLS could be an alternative to TC processing in daily clinical application.
抄録全体を表示

PDF形式でダウンロード (974K)
Efficient Accuracy Evaluation Methods for Lattice-Based Fuzzy Extractors and Fuzzy Signatures

Wataru NAKAMURA, Kenta TAKAHASHI

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
論文ID: 2025CIP0008
発行日: 2025年
[早期公開] 公開日: 2025/09/01

DOI https://doi.org/10.1587/transfun.2025CIP0008

ジャーナルフリー早期公開

抄録を表示する抄録を非表示にする

Fuzzy Extractors (FEs) and Fuzzy Signatures (FSs) are promising primitives for realizing online biometric authentication with Biometric Template Protection (BTP). To realize better authentication accuracy, lattice-based FEs and FSs have been studied. To apply them to biometric authentication, one has to determine the lattice scale to an appropriate value because it affects accuracy as well as the threshold in ordinary matching schemes. To find such an appropriate scale, one has to evaluate accuracy at various scales using a dataset. A simple method might be to change the scale to various values and repeat the matching for all pairs, but it is inefficient. The matching process includes solving the Closest Vector Problem (CVP), so when we evaluate accuracy for k scales using the dataset including P pairs, the simple method has to solve CVP kP times. In this paper, we propose a method to obtain accuracy for almost all scales without solving CVP. The proposed method computes the distance induced by the Minkowski functional of the Voronoi region, which we call the lattice distance, for each pair only once. Furthermore, in the case of a triangular lattice, we give a Θ (n) time algorithm for computing the lattice distance. Experimental analysis for the triangular lattice shows that accuracy for almost all scales can be obtained by the proposed method in a shorter time than the time required for obtaining accuracy for one scale by the simple method. We also show that accuracy at the remaining scales can be obtained by additionally solving CVP only once for each pair.

抄録全体を表示

PDF形式でダウンロード (1557K)
Accelerating a Lloyd-Type k-Means Clustering Algorithm with Summable Lower Bounds in a Lower-Dimensional Space

Kazuo AOYAMA, Kazumi SAITO, Tetsuo IKEDA

IEICE Transactions on Information and Systems
2018年 E101.D 巻 11 号 2773-2783
発行日: 2018/11/01
公開日: 2018/11/01

DOI https://doi.org/10.1587/transinf.2017EDP7392

ジャーナルフリー

抄録を表示する抄録を非表示にする

This paper presents an efficient acceleration algorithm for Lloyd-type k-means clustering, which is suitable to a large-scale and high-dimensional data set with potentially numerous classes. The algorithm employs a novel projection-based filter (PRJ) to avoid unnecessary distance calculations, resulting in high-speed performance keeping the same results as a standard Lloyd's algorithm. The PRJ exploits a summable lower bound on a squared distance defined in a lower-dimensional space to which data points are projected. The summable lower bound can make the bound tighter dynamically by incremental addition of components in the lower-dimensional space within each iteration although the existing lower bounds used in other acceleration algorithms work only once as a fixed filter. Experimental results on large-scale and high-dimensional real image data sets demonstrate that the proposed algorithm works at high speed and with low memory consumption when large k values are given, compared with the state-of-the-art algorithms.

抄録全体を表示

PDF形式でダウンロード (778K)
インテリジェント・ローカルアプローチによる自動要素分割法 : 第1報, 手法の提案と四辺形要素分割

和田義孝, 吉村忍, 矢川元基

日本機械学会論文集 A編
1998年 64 巻 622 号 1556-1563
発行日: 1998/06/25
公開日: 2008/02/21

DOI https://doi.org/10.1299/kikaia.64.1556

ジャーナルフリー

抄録を表示する抄録を非表示にする

This paper describes a new mesh generation algorithm, namely Intelligent Local Approach, which controls well size and aspect ratio of quadrilateral and hexahedral element simultaneously. Here we define the following three fields, i. e. a distribution of element size, that of aspect ratio and that of priority of element creation. The ILA also ensures mesh quality. In ILA, elements are created sequentially considering local information on geometrical constraints and user's demand on quality of the element to be created. To efficiently deal with various complicated geometrical constraints in a local region, a fuzzy knowledge processing technique is effectively utilized. Fundamental performances of the ILA are examined in detail through generating several quadrilateral meshes.
抄録全体を表示

PDF形式でダウンロード (851K)
3D Printing Factors Important for the Fabrication of Polyvinylalcohol Filament-Based Tablets

Tatsuaki Tagami, Kaori Fukushige, Emi Ogawa, Naomi Hayashi, Tetsuya Ozeki

Biological and Pharmaceutical Bulletin
2017年 40 巻 3 号 357-364
発行日: 2017/03/01
公開日: 2017/03/01

DOI https://doi.org/10.1248/bpb.b16-00878

ジャーナルフリー HTML

抄録を表示する抄録を非表示にする

Three-dimensional (3D) printers have been applied in many fields, including engineering and the medical sciences. In the pharmaceutical field, approval of the first 3D-printed tablet by the U.S. Food and Drug Administration in 2015 has attracted interest in the manufacture of tablets and drugs by 3D printing techniques as a means of delivering tailor-made drugs in the future. In current study, polyvinylalcohol (PVA)-based tablets were prepared using a fused-deposition-modeling-type 3D printer and the effect of 3D printing conditions on tablet production was investigated. Curcumin, a model drug/fluorescent marker, was loaded into PVA-filament. We found that several printing parameters, such as the rate of extruding PVA (flow rate), can affect the formability of the resulting PVA-tablets. The 3D-printing temperature is controlled by heating the print nozzle and was shown to affect the color of the tablets and their curcumin content. PVA-based infilled tablets with different densities were prepared by changing the fill density as a printing parameter. Tablets with lower fill density floated in an aqueous solution and their curcumin content tended to dissolve faster. These findings will be useful in developing drug-loaded PVA-based 3D objects and other polymer-based articles prepared using fused-deposition-modeling-type 3D printers.

抄録全体を表示

PDF形式でダウンロード (5808K) HTML形式で全画面表示
Fast Transient Simulation of Large Scale RLC Networks Including Nonlinear Elements with SPICE Level Accuracy

Yuichi TANJI

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
2015年 E98.A 巻 5 号 1067-1076
発行日: 2015/05/01
公開日: 2015/05/01

DOI https://doi.org/10.1587/transfun.E98.A.1067

ジャーナル認証あり

抄録を表示する抄録を非表示にする

Fast simulation techniques of large scale RLC networks with nonlinear devices are presented. Generally, when scale of nonlinear part in a circuit is much less than the linear part, matrix or circuit partitioning approach is known to be efficient. In this paper, these partitioning techniques are used for the conventional transient analysis using an implicit numerical integration and the circuit-based finite-difference time-domain (FDTD) method, whose efficiency and accuracy are evaluated developing a prototype simulator. It is confirmed that the matrix and circuit partitioning approaches do not degrade accuracy of the transient simulations that is compatible to SPICE, and that the circuit partitioning approach is superior to the matrix one in efficiency. Moreover, it is demonstrated that the circuit-based FDTD method can be efficiently combined with the matrix or circuit partitioning approach, compared with the transient analysis using an implicit numerical integration.
抄録全体を表示

PDF形式でダウンロード (421K)
中性子基盤研究開発

曽山和彦, 清水裕彦, 大友季哉

日本結晶学会誌
2008年 50 巻 1 号 66-71
発行日: 2008/02/29
公開日: 2010/09/30

DOI https://doi.org/10.5940/jcrsj.50.66

ジャーナルフリー

抄録を表示する抄録を非表示にする

Advanced neutron instrumentations are key to accelerated progress of high intense pulsed neutron research at J-PARC, SNS and ISIS-TS2. Current status of development of scintillating and gaseous neutron detectors, application of neutron reflective and magnetic optics, and soft ware development for large amount data acquisition at J-PARC, are described.
抄録全体を表示

PDF形式でダウンロード (13483K)
Secure Caching Scheme by Using Blockchain for Information-Centric Network-Based Wireless Sensor Networks

Shintaro Mori

Journal of Signal Processing
2018年 22 巻 3 号 97-108
発行日: 2018/05/25
公開日: 2018/05/25

DOI https://doi.org/10.2299/jsp.22.97

ジャーナルフリー

抄録を表示する抄録を非表示にする

This paper addresses a secure caching scheme for information-centric network (ICN)-based wireless sensor networks (WSNs). In order to achieve the above mechanism, we utilize both the public-key cryptography technique and the blockchain technique, which enable data to be safely gathered and the decentralized and crossverified sensing data to be copied and stored. In addition, we propose a protocol design for introducing the proposed structure into an ICN-based WSN system. Furthermore, we formulate statistical models and demonstrate numerical results by performing computer simulations and hardware-based experiments.

抄録全体を表示

PDF形式でダウンロード (1976K)
Application of Online Automated Segmentation and Evaluation Method in Anomaly Detection at Rail Profile Based on Pattern Matching and Complex Networks

Lingling Tong, Zhimin Lv, Jing Guo

ISIJ International
2024年 64 巻 10 号 1528-1537
発行日: 2024/08/15
公開日: 2024/08/15
[早期公開] 公開日: 2024/06/27

DOI https://doi.org/10.2355/isijinternational.ISIJINT-2024-003

ジャーナルオープンアクセス HTML

抄録を表示する抄録を非表示にする

In steel rail production, complex deformations can induce non-uniform changes in cross-sectional profiles along the rail’s length, resulting in unevenness and safety implications. It is essential to perform dimensional testing to ascertain compliance with standard requirements. Currently, profile inspection results are manually evaluated, posing efficiency challenges and a lack of standardized criteria. To address this challenge, this paper proposes an online automatic steel rail segmentation and evaluation method (online-ASE) based on pattern matching and complex networks to enable automatic rail profile assessment. This method initially utilizes offline high-dimensional time series data for conducting Toeplitz Inverse Covariance-based Clustering (TICC) training and constructs a standard quality characterization pattern library through distinct inverse covariance structures between abnormal and normal high-dimensional quality characterization indicators of steel rails. When applied online, the Viterbi shortest path dynamic programming algorithm is utilized to match steel rail data with the pattern library, swiftly identifying anomalous rail segments. Additionally, the algorithm computes the contribution of steel rail quality parameters to the segmentation results using complex network betweenness centrality, thereby explaining the reasons for segment formation. These explanations provide a reference basis for subsequent steel rail repairs. Finally, the effectiveness of the proposed method is validated using real-world steel rail data from a specific steel factory in China.

抄録全体を表示

PDF形式でダウンロード (2869K) HTML形式で全画面表示
GPU-Accelerated 3D Normal Distributions Transform

Anh Nguyen, Abraham Monrroy Cano, Masato Edahiro, Shinpei Kato

Journal of Robotics and Mechatronics
2023年 35 巻 2 号 445-459
発行日: 2023/04/20
公開日: 2023/04/20

DOI https://doi.org/10.20965/jrm.2023.p0445

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

The three-dimensional (3D) normal distributions transform (NDT) is a popular scan registration method for 3D point cloud datasets. It has been widely used in sensor-based localization and mapping applications. However, the NDT cannot entirely utilize the computing power of modern many-core processors, such as graphics processing units (GPUs), because of the NDT’s linear nature. In this study, we investigated the use of NVIDIA’s GPUs and their programming platform called compute unified device architecture (CUDA) to accelerate the NDT algorithm. We proposed a design and implementation of our GPU-accelerated 3D NDT (GPU NDT). Our methods can achieve a speedup rate of up to 34 times, compared with the NDT implemented in the point cloud library (PCL).

抄録全体を表示

PDF形式でダウンロード (1711K)
上顎悪性腫瘍切除後にインクジェット法を利用した三次元積層造形モデルによる顎補綴を行った1例

今井裕一郎, 上田順宏, 畠中利英, 柳生貴裕, 中村泰士, 山川延宏, 青木久美子, 山中康嗣, 桐田忠昭

日本口腔腫瘍学会誌
2012年 24 巻 4 号 155-163
発行日: 2012年
公開日: 2013/02/26

DOI https://doi.org/10.5843/jsot.24.155

ジャーナルフリー

抄録を表示する抄録を非表示にする

上顎悪性腫瘍に対する術後の欠損部位に対して，再建手術が進歩した現在でも顎義歯が多く適応されている。顎義歯には中空型と天蓋開放型があるが，浸水液が貯留して雑菌の繁殖や発音時の音のひずみ，共鳴といった欠点がある。
近年，CTなどの三次元画像データからCAD/CAM技術を応用し，実物と同形態の立体モデルが製作可能な光造型装置が臨床応用されるようになった。しかし，モデル作製に費用と時間を要するものであった。
今回，われわれは上顎悪性腫瘍切除後の顎義歯作製時に従来型の光造型法に代わり，インクジェット法を利用した三次元積層造形モデルを応用した。本法は軟部組織を抽出し，モデル化することによって口腔内印象では得ることのできない上顎悪性腫瘍切除後の欠損腔全体をモデル化することが可能となり，アンダーカットを有効に活用した辺縁封鎖性の良い空気注入式顎義歯製作に有効な方法であると思われた。
抄録全体を表示

PDF形式でダウンロード (1098K)
複素対称線形方程式における多倍長精度共役直交共役勾配法の性能評価

桝井晃基, 荻野正雄

日本計算工学会論文集
2018年 2018 巻 20180007
発行日: 2018/03/29
公開日: 2018/03/29

DOI https://doi.org/10.11421/jsces.2018.20180007

ジャーナルフリー

抄録を表示する抄録を非表示にする

辺要素有限要素法による時間調和渦電流解析や高周波電磁場解析で得られる複素対称線形方程式の求解においては, 共役直交共役勾配 (COCG) 法などの反復法の収束性が悪いことが知られている. また, 解析する対象が大規模や複雑になるにつれ収束性はさらに悪化する. よって, 電磁場解析を高効率で行うためには, 大規模な複素対称線形方程式に有効な反復法が必要となる. ここで, 実対称線形方程式に対して多倍長精度演算による反復法の収束性改善が報告されている. 特に倍精度変数を2つ用いた倍々精度演算による研究が広く行われており, 最新計算機アーキテクチャに対する最適化などの報告もされている. しかし, 複素数問題に対する研究例は少なく, 電磁場解析に対しても有効であるかは自明ではない.
そこで本研究は複素対称線形方程式向けの高性能な反復法ソルバー開発を目的として, 倍々精度複素数演算および倍々精度と倍精度の混合精度演算を実装し, 開発システムや既存の多倍長精度演算方式の基本的な計算性能を評価したのち, 混合精度演算を含め多倍長精度演算を用いたCOCG法による複素対称線形方程式の求解システムを開発した.
今回の開発システムの中で使われる倍々精度演算はKnuthやDekkerのアルゴリズムを用いてエラーフリーな計算に基づいている. 倍々精度演算を利用するものとしてよく使われるものとしてQDライブラリなどといったものがあるが, これは実数における倍精度と倍々精度などの混合精度演算は実装されており, 演算子がオーバーロードされている. しかし, complexクラステンプレートを利用する複素数では演算子のオーバーロード実装が十分ではなく, 効率的なコーディングが難しい問題がある. そこで今回は, 倍精度複素数と倍々精度複素数の混合精度演算を実装し, それに基づく混合精度COCG法を提案・実装をし, 性能評価を行った.
まず, 高精度演算の基本的な性能評価として, dot積演算にかかる時間を測定した. 高精度演算手法として, 開発システムの他にQDライブラリによる倍々精度・疑似八倍精度演算, C言語におけるlibquadmathによる四倍精度演算を用いて性能評価や比較を行った. その結果, 今回用いた環境においては倍精度に比べて, 開発システムの倍々精度演算で約6.9倍, QDライブラリの倍々精度演算で約8.1倍, libquadmathの四倍精度で約47倍, QDライブラリの疑似八倍精度演算で約76倍の計算時間がかかることが分かった. 次に, COCG法の性能評価として, 対象となる電磁場解析問題に今回はハイパーサーミアの簡易モデルに関する行列を用いた. 開発システムをこの問題に適用したところ, 倍精度演算に比べCOCG法の反復回数を約2/3に減らすことに成功し, 計算時間も約1.5倍の増加で抑えることが出来た. また, QDライブラリの疑似八倍精度を用いるとさらに収束性は改善したが, 計算時間は倍精度と比べ30～70倍ほどに増加し, 倍々精度数の精度で十分であるといえる. さらに, 前処理のパラメータ選択において, 混合精度演算を含めた高精度演算を用いることでパラメータの影響を抑えることに成功し, ロバスト性の高い手法になっていると言える.
今回は大規模な複素対称線形方程式の求解を高速化することを目的として, 倍々精度及び混合精度の複素数演算を実装しその基本的な計算速度を調査したのち, 係数行列と前処理行列を倍精度とする混合精度COCG法を提案・実装した. その結果倍精度演算に比べ反復法の収束性を改善させ, 大幅な計算時間の増加も抑えることに成功した. さらに開発システムは倍精度演算に比べパラメータフリーな実装になっている. 以上から本システムの有効性を示した.
今後は, より大規模・複雑な問題に対して適用することで, 提案方式の有効性検証を進めていく. また, Intel AVX命令などを用いた高性能計算技術を活用することで, さらなる高速化の実現を目指す. さらに, 他の反復法や前処理法においても混合精度演算の有効性評価を行っていく.
抄録全体を表示

PDF形式でダウンロード (879K)
最適化機構を持つC++並列スケルトンライブラリ

明石良樹, 松崎公紀, 岩崎英哉, 筧一彦, 胡振江

コンピュータソフトウェア
2005年 22 巻 3 号 3_214-3_221
発行日: 2005年
公開日: 2005/08/31

DOI https://doi.org/10.11309/jssst.22.3_214

ジャーナルフリー

抄録を表示する抄録を非表示にする

並列プログラムを作成する難しさを緩和するため，頻繁に用いられる並列処理のパターンをあらかじめ実装しておき，そのパターンの組み合わせでプログラムを作成する「スケルトン並列プログラミング」が考案されている．プログラマは並列スケルトンと呼ばれる関数を組み合わせることによって，並列プログラムを容易に記述することができるようになる．しかし，並列スケルトンで書かれたプログラムは，必ずしも実行効率が良くないという問題点がある．本研究では，BMFという計算モデルに基づいた並列スケルトンライブラリをC++上に実装した．その上で，融合変換と呼ばれる，複数の関数呼出しを単一の関数呼出しに融合する手法を適用し，並列スケルトンで書かれたプログラムを最適化する機構を実現した．本論文では，並列スケルトンライブラリの実装を提案し，最適化の効果について報告する．
抄録全体を表示

PDF形式でダウンロード (110K)
論理・制約プログラミングと並行計算

上田和紀

コンピュータソフトウェア
2008年 25 巻 3 号 3_49-3_54
発行日: 2008/07/25
公開日: 2008/09/30

DOI https://doi.org/10.11309/jssst.25.3_49

ジャーナルフリー

PDF形式でダウンロード (93K)

J-STAGEへの登録はこちら（無料）