詳細検索結果
以下の条件での結果を表示する: 検索条件を変更
クエリ検索: "SIMD"
980件中 1-20の結果を表示しています
  • 鈴木 貢, 藤波 順久
    コンピュータ ソフトウェア
    2008年 25 巻 1 号 1_65-1_81
    発行日: 2008年
    公開日: 2008/03/31
    ジャーナル フリー
    最近公表されるマイクロプロセッサの殆どが具備するようになったメディア処理向け
    SIMD
    拡張命令セットは,ベクタ命令セットの特別な場合と考えられるが,従来のベクタ命令にはない特徴や制約があり,そのための最適化技術をそのまま流用するだけでは潜在能力を引き出すことができない.COINSプロジェクトでは
    SIMD
    拡張命令セット向け最適化を,ベクタ化を軸としたソースコードレベルで可能な最適化と,そのような変換を施されたプログラムに対して適切な
    SIMD
    命令を生成する最適化の2段階に分割し,我々が「
    SIMD
    並列化」と呼ぶ後者を中心に研究を行った.本稿では,COINSにおける
    SIMD
    並列化について報告する.
  • Leon Rotstayn, Rhys Francis, David Abramson, Martin Dix
    気象集誌. 第2輯
    1993年 71 巻 2 号 297-304
    発行日: 1993/04/25
    公開日: 2009/02/12
    ジャーナル フリー
    次世代の大循環モデル(GCM)に必要とされるテラフロップス機(毎秒1012回以上の浮動小数点演算ができる計算機)の一つの可能性は
    SIMD
    (Single Instruction stream, Multiple Data stream)法による超並列計算機である。
    SIMD
    法でモデルを走らせるうえでの問題は、モデルの物理過程のパラメタリゼーションがこの種の機械に対して効率良く計算できるようになっているかどうかである。並列機の各プロセッサがモデルの各グリッドに割り当てられると仮定して、
    SIMD
    法で効率的に計算できるかどうかの可能性を評価するために、GCMの物理過程を解析した。この論文ではさまざまな物理過程ルーチンに対する結果を、MIMD (Multiple Instruction stream, Multiple Data stream)法で計算を行なった場合に得られるものと比較した。その結果、
    SIMD
    法のMIMD法に対する非効率性は15%∼20%に過ぎないことが分かった。これは
    SIMD
    法にとって満足の行く結果であり、気候モデルや数値予報にとって
    SIMD
    計算機をしりぞけるべきでないことを意味している。
  • Masanari Iida, Naoto Niki
    Journal of the Japanese Society of Computational Statistics
    2014年 27 巻 1 号 81-93
    発行日: 2014/12/20
    公開日: 2015/02/05
    ジャーナル フリー
    ABSTRACT In each
    SIMD
    (Single Instruction, Multiple Data) group, called a `warp’ of a GPU (Graphics Processing Unit), all the ?xed number of threads execute the same instruction concurrently at each unit period of time. We consider a class of probabilistic algorithms designed for use on GPUs, including a wide variety of Monte Carlo methods,such that each thread contains a loop iterated stochastically variable times, and that the life-cycle of a warp ends when the slowest thread completes its requested task.A run-time model is proposed in order to explain the distributions of execution time observed in
    SIMD
    parallel computations using the algorithms of this class. Asymptotic properties of those distributions are also presented.
  • 小久保 達信, 長岡 伸一, 寺前 裕之, 長嶋 雲兵
    Journal of Computer Chemistry, Japan
    2019年 18 巻 4 号 169-175
    発行日: 2019年
    公開日: 2020/01/28
    ジャーナル フリー HTML

    分子動力学プログラムLAMMPSは,分散並列処理により高い効率で実装されており,スーパーコンピュータ「京」でも,大規模なノードを使ってもよくスケールし,高性能を発揮している.LAMMPSの さらなる高速化を目指し,Mod -FixLan機能で利用されている乱数ルーチンのSingle Instruction Multi Data (

    SIMD
    /ベクトル) 化及びOpenMPでのスレッド並列化によるさらなる高速化の実現を試みた.乱数生成は逐次処理のアルゴリズムが基本であり
    SIMD
    化及びOpenMPによる並列化の難しい部分の高速化を新たに実装した.特に乱数ルーチンの実装の改良によって,全体で46%程度の性能向上が観測された.まだまだ,LAMMPSには高速化の余地がある.

  • *大田 昇平, 山下 義行
    電気関係学会九州支部連合大会講演論文集
    2012年 2012 巻 11-1A-04
    発行日: 2012/09/14
    公開日: 2014/12/17
    会議録・要旨集 フリー
    PlayStation3に搭載されているCell上で、OpenCLを用いた相対論的コンピュータ
    グラフィックスを実装し、リアルタイム画像生成を実現した。
    高速化手法として、ダブルバッファを用いてマルチコア間通信を隠蔽し、
    描画処理に
    SIMD
    演算を導入した。
    本稿では様々な条件での画像生成時間を評価し、Cell上でのOpenCLの特性を明らかにする。
  • Takeshi Kumaki, Yuma Murakami, Shuhei Itaya, Kei Nakao, Takeshi Ogura, Takeshi Fujino
    Journal of Signal Processing
    2012年 16 巻 6 号 547-556
    発行日: 2012/11/30
    公開日: 2013/03/15
    ジャーナル フリー
    This paper presents Max-plus algebra-based Morphological wavelet Transform (MMT) watermarking, which is a novel mobile-dedicated information data embedding method. The performance of watermarking processing is considered a very important factor in embedded processors, which are applied to mobile devices. This is because, in the case of embedding information data in digital contents for mobile application, real-time processing speed is more important than data security and image quality. The proposed MMT watermarking method can be defined by nonlinear operation (maximum or minimum search) and standard sum. Thus, while the proposed watermarking method only uses integer representation for all variables with embedding operation, the conventional watermarking method has to calculate complex numbers, trigonometric functions, and floating-points for restricting watermarked-image distortion. As a result, the proposed method can reduce the number of clock cycles to drastically less than those of conventional transform-based watermarking. The processing time of the MMT watermarking is about 1/28, 1/2720, and 1/4364 shorter than those of Haar Wavelet Transform (HWT), Discrete Cosine Transform (DCT), and Fast Fourier Transform (FFT), respectively. Furthermore, the practical-use processing time of the MMT watermarking with the evaluation board for the massive-parallel Single Instruction Multiple Data (
    SIMD
    ) matrix processor is found to be about 96% shorter than that of the HWT watermarking method on the BeagleBoard-xM, which is used the ARM Cortex-A8. To compare the watermarked and extract image distortion, the PSNR values of MMT, DCT and HWT watermarking are calculated. The PSNR value of the proposed MMT watermarking is suitable for practical use with 5-bit watermark embedded information, which can keep maintain image quality for both watermarked and extract images. Consequently, the proposed MMT watermarking algorithm can achieve an efficient real-time watermarking scheme for mobile device applications.
  • 渡辺 宙志
    日本物理学会講演概要集
    2017年 72.1 巻 19pK-PS-51
    発行日: 2017年
    公開日: 2018/04/19
    会議録・要旨集 フリー

    分子動力学法で広く用いられているLennard-Jonesポテンシャルで相互作用する粒子系の力計算をAVX, AVX2命令を用いて加速することを試みる。

  • 岩田 奈央, 鏡 慎吾, 橋本 浩一
    ロボティクス・メカトロニクス講演会講演概要集
    2007年 2007 巻 2P1-D07
    発行日: 2007/05/11
    公開日: 2017/06/19
    会議録・要旨集 フリー
    Tins paper describes a new recon.flgurabie processor architecture for high frame rate visual processing. This architecture employs a 2-D mesh processing element (PE) array in which the PEs can he configured to operate as
    SIMD
    arrays or operation-pipeline trees depending on image processing algorithms so that maximum on-chip memory consumption is reduced, To achieve high on-chip memory utilization, the architecture features that. the instruction register in each PE is mapped in its local memory space and that tire ALU network and the local memory network can be configured independently. Experimental results show that the proposed architecture effectively utilizes both of the
    SIMD
    and operation pipeline modes.
  • Wei GAO, Lin HAN, Rongcai ZHAO, Yingying LI, Jian LIU
    IEICE Transactions on Information and Systems
    2017年 E100.D 巻 1 号 91-106
    発行日: 2017/01/01
    公開日: 2017/01/01
    ジャーナル フリー

    Single-instruction multiple-data (

    SIMD
    ) extension provides an energy-efficient platform to scale the performance of media and scientific applications while still retaining post-programmability. However, the major challenge is to translate the parallel resources of the
    SIMD
    hardware into real application performance. Currently, all the slots in the vector register are used when compilers exploit
    SIMD
    parallelism of programs, which can be called sufficient vectorization. Sufficient vectorization means all the data in the vector register is valid. Because all the slots which vector register provides must be used, the chances of vectorizing programs with low
    SIMD
    parallelism are abandoned by sufficient vectorization method. In addition, the speedup obtained by full use of vector register sometimes is not as great as that obtained by partial use. Specifically, the length of vector register provided by
    SIMD
    extension becomes longer, sufficient vectorization method cannot exploit the
    SIMD
    parallelism of programs completely. Therefore, insufficient vectorization method is proposed, which refer to partial use of vector register. First, the adaptation scene of insufficient vectorization is analyzed. Second, the methods of computing inter-iteration and intra-iteration
    SIMD
    parallelism for loops are put forward. Furthermore, according to the relationship between the parallelism and vector factor a method is established to make the choice of vectorization method, in order to vectorize programs as well as possible. Finally, code generation strategy for insufficient vectorization is presented. Benchmark test results show that insufficient vectorization method vectorized more programs than sufficient vectorization method by 107.5% and the performance achieved by insufficient vectorization method is 12.1% higher than that achieved by sufficient vectorization method.

  • 柏浦 正広
    日本集中治療医学会雑誌
    2024年 31 巻 5 号 463-465
    発行日: 2024/09/01
    公開日: 2024/09/15
    ジャーナル フリー
  • 赤嶺 有平, 遠藤 聡志, 山田 孝治
    人工知能学会論文誌
    2001年 16 巻 1 号 74-84
    発行日: 2001年
    公開日: 2002/02/28
    ジャーナル フリー
    Cellular Automata (CA) are ideally suited for parallel processing, and have been characterized as easy to parallelize. Therefore their simulation has prospects of speeding up using
    SIMD
    (Single Instruction stream, Multiple Data stream). Furthermore a low-cost CPU has been had
    SIMD
    such as MMX technology. MMX technology is a kind of
    SIMD
    what permits one instruction cycle to act on multiple data pieces and is
    SIMD
    what may be one of the most famous technologies. In this paper, we propose a method of high-speed CA simulation using MMX technology without dedicated purpose hardware. The results of simulations represent that our method is better with 10 times than scalar arithmetic.
  • Dr. Kathleen Tait MBPS SFHEA
    日本重症心身障害学会誌
    2020年 45 巻 1 号 55-64
    発行日: 2020年
    公開日: 2022/08/03
    ジャーナル フリー
    Children with severe intellectual and multiple disabilities often appear to be passive and unresponsive to environmental stimulation. A major priority for such children is to increase the amount of time that they are alert and actively engaged. This research study examined the extent to which five children with severe intellectual and multiple impairments aged 9 – 13 years were reported by school carers (support staff) to indicate engagement and the extent to which these reported indices of responsiveness varied in relation to differing levels of environmental stimulation. Each child was directly observed across three different environmental conditions that varied in terms of the amount and type of stimulation provided. Support staff rated the child’s potential engagement behaviours using the Inventory of Potential Communicative Acts (IPCA). Findings suggested that the children’s levels of alertness/engagement did seem to vary reliably and consistently in relation to the amount and type of environmental stimulation being provided. These results suggest that children who appear largely passive and unresponsive might show subtle signs of alertness/engagement in response to higher levels of environmental stimulation. The presence of these signs of alertness might signal times when the child is actively engaged and more likely to be responsive to instruction.
  • Yaohua WANG, Shuming CHEN, Hu CHEN, Jianghua WAN, Kai ZHANG, Sheng LIU
    IEICE Transactions on Information and Systems
    2013年 E96.D 巻 2 号 365-369
    発行日: 2013/02/01
    公開日: 2013/02/01
    ジャーナル フリー
    The efficiency of ubiquitous
    SIMD
    (Single Instruction Multiple Data) media processors is seriously limited by the bottleneck effect of the scalar kernels in media applications. To solve this problem, a dual-core framework, composed of a micro control unit and an instruction buffer, is proposed. This framework can dynamically decouple the scalar and vector pipelines of the original single-core
    SIMD
    architecture into two free-running cores. Thus, the bottleneck effect can be eliminated by effectively exploiting the parallelism between scalar and vector kernels. The dual-core framework achieves the best attributes of both single-core and dual-core
    SIMD
    architectures. Experimental results exhibit an average performance improvement of 33%, at an area overhead of 4.26%. What's more, with the increase of the
    SIMD
    width, higher performance gain and lower cost can be expected.
  • 阿部 博史, 長嶋 雲兵
    Journal of Computer Chemistry, Japan
    2009年 8 巻 1 号 51-58
    発行日: 2009/03/15
    公開日: 2009/03/15
    [早期公開] 公開日: 2009/02/02
    ジャーナル フリー
    リアルタイム三次元可視化分子動力学シミュレータMolecula Numericaを開発した.高性能パーソナルコンピュータをプラットフォームと想定し,その計算能力および三次元表示能力を活かした, MacOS X,Windowsなどで動作するマルチOSのソフトウェアである.通常の単原子モデルに,リジッド分子モデルを備えており,並進運動と回転運動の両方を扱える.回転運動には四元数表記方程式系を用いた.また,ソフトウェアはC++で開発されており,リジッド原子モデルに適した原子/分子クラスと,2体間相互作用をC++のTemplate機能を用いたダブルディスパッチ法を用いて実装し,原子/分子クラスと,相互作用の定義を分離させる事によって,新たに相互作用モデルを追加する際のプログラムの変更コストを低減させた.計算例として塩の結晶が水に溶け出す様子のシミュレーションを紹介した.本ソフトウェアのバイナリはMacOSXとWindows XP用が無償で提供されている.
  • Takeshi KUMAKI, Tetsushi KOIDE, Hans Jürgen MATTAUSCH, Masaharu TAGAMI, Masakatsu ISHIZAKI
    IEICE Transactions on Information and Systems
    2011年 E94.D 巻 9 号 1742-1754
    発行日: 2011/09/01
    公開日: 2011/09/01
    ジャーナル フリー
    This paper presents a software-based parallel cryptographic solution with a massive-parallel memory-embedded
    SIMD
    matrix (MTX) for data-storage systems. MTX can have up to 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. Furthermore, a next-generation
    SIMD
    matrix called MX-2 has been developed by expanding processing-element capability of MTX from 2-bit to 4-bit processing. These
    SIMD
    matrix architectures are verified to be a better alternative for processing repeated-arithmetic and logical-operations in multimedia applications with low power consumption. Moreover, we have proposed combining Content Addressable Memory (CAM) technology with the massive-parallel memory-embedded
    SIMD
    matrix architecture to enable fast pipelined table-lookup coding. Since both arithmetic logical operation and table-lookup coding execute extremely fast on these architectures, efficient execution of encryption and decryption algorithms can be realized. Evaluation results of the CAM-less and CAM-enhanced massive-parallel
    SIMD
    matrix processor for the example of the Advanced Encryption Standard (AES), which is a widely-used cryptographic algorithm, show that a throughput of up to 2.19Gbps becomes possible. This means that several standard data-storage transfer specifications, such as SD, CF (Compact Flash), USB (Universal Serial Bus) and SATA (Serial Advanced Technology Attachment) can be covered. Consequently, the massive-parallel
    SIMD
    matrix architecture is very suitable for private information protection in several data-storage media. A further advantage of the software based solution is the flexible update possibility of the implemented-cryptographic algorithm to a safer future algorithm. The massive-parallel memory-embedded
    SIMD
    matrix architecture (MTX and MX-2) is therefore a promising solution for integrated realization of real-time cryptographic algorithms with low power dissipation and small Si-area consumption.
  • Ha-young JEONG, Min-young CHO, Won HUR, Yong-surk LEE
    IEICE Transactions on Electronics
    2008年 E91.C 巻 7 号 1171-1174
    発行日: 2008/07/01
    公開日: 2010/03/01
    ジャーナル 認証あり
    In this letter, we propose a partial access mechanism to be used on a register file for low-cost embedded multimedia processor architecture. In the embedded system, supporting the
    SIMD
    operations is a burden because of the wide
    SIMD
    register file and its execution unit. So a new architecture is proposed to increase the performance of
    SIMD
    operations with minimal additional hardware overhead. To evaluate the performance and hardware overhead, this architecture is adopted to an embedded multimedia processor and simulated with five DSP benchmarks. The simulation results indicate that the performance is increased by 38% and the total area is increased by 13.4%. The proposed partial access mechanism may be useful for low-cost embedded multimedia ASIP.
  • 藤井 孟大, 浅井 光輝, 井元 佑介
    土木学会論文集A2(応用力学)
    2020年 76 巻 2 号 I_247-I_257
    発行日: 2020年
    公開日: 2021/02/01
    ジャーナル フリー

    安定化 ISPH 法は,非圧縮性流れに対する粒子法の一種であり,圧力ポアソン方程式に粒子配置を平滑化するための安定化項を付加した粒子法である.固液混相流解析においては,固液の境界付近で生じる粒子密度の誤差を補正するため,安定化項の影響を大きくする必要がある.一方で,安定化項は物理変数に非物理的な影響を与えるため,激しい流れを伴う固液混相流解析では物理変数の誤差が影響して計算が不安定化する.そこで本研究では,物理変数の更新に用いる流速(物理速度)と粒子の位置更新に用いる流速(輸送速度)を区別し,輸送速度にのみ安定化項の影響を与える選択型デュアル流速 ISPH 法を提案する.提案手法の精度検証としてダム崩壊混相流れの再現解析を行い,実験および従来手法と比較することで提案手法の妥当性と有効性を確認した.

  • Gilseok HONG, Seonghyeon KANG, Chang soo KIM, Jun-Ki MIN
    IEICE Transactions on Information and Systems
    2018年 E101.D 巻 3 号 659-667
    発行日: 2018/03/01
    公開日: 2018/03/01
    ジャーナル フリー

    In this paper, we study parallel join processing to improve the performance of the merge phase of sort-merge join by integrating all parallelism provided by mainstream CPUs. Modern CPUs support

    SIMD
    instruction sets with wider
    SIMD
    registers which allows to process multiple data items per each instruction. Thus, we devise an efficient parallel join algorithm, called Parallel Merge Join with
    SIMD
    instructions
    (PMJS). In our proposed algorithm, we utilize data parallelism by exploiting
    SIMD
    instructions. And we also accelerate the performance by avoiding the usage of conditional branch instructions. Furthermore, to take advantage of the multiple cores, our proposed algorithm is threaded in multi-thread environments. In our multi-thread algorithm, to distribute workload evenly to each thread, we devise an efficient workload balancing algorithm based on the kernel density estimator which allows to estimate the workload of each thread accurately.

  • Takeshi KUMAKI, Masakatsu ISHIZAKI, Tetsushi KOIDE, Hans Jürgen MATTAUSCH, Yasuto KURODA, Takayuki GYOHTEN, Hideyuki NODA, Katsumi DOSAKA, Kazutami ARIMOTO, Kazunori SAITO
    IEICE Transactions on Electronics
    2008年 E91.C 巻 9 号 1409-1418
    発行日: 2008/09/01
    公開日: 2010/03/01
    ジャーナル 認証あり
    This paper presents an integration architecture of content addressable memory (CAM) and a massive-parallel memory-embedded
    SIMD
    matrix for constructing a versatile multimedia processor. The massive-parallel memory-embedded
    SIMD
    matrix has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. The
    SIMD
    matrix architecture is verified to be a better way for processing the repeated arithmetic operation types in multimedia applications. The proposed architecture, reported in this paper, exploits in addition CAM technology and enables therefore fast pipelined table-lookup coding operations. Since both arithmetic and table-lookup operations execute extremely fast, the proposed novel architecture can realize consequently efficient and versatile multimedia data processing. Evaluation results of the proposed CAM-enhanced massive-parallel
    SIMD
    matrix processor for the example of the frequently used JPEG image-compression application show that the necessary clock cycle number can be reduced by 86% in comparison to a conventional mobile DSP architecture. The determined performances in Mpixel/mm2 are factors 3.3 and 4.4 better than with a CAM-less massive-parallel memory-embedded
    SIMD
    matrix processor and a conventional mobile DSP, respectively.
  • *姫野 龍太郎
    理論応用力学講演会 講演論文集
    2005年 54 巻 PD2-1
    発行日: 2005年
    公開日: 2005/04/08
    会議録・要旨集 フリー
    PCの計算性能は飛躍的に向上し、PC一台でかなりの計算ができるようになった。このような状況はこれからも続くのであろうか。また、PC一台の性能で不足する場合はネットワークスイッチを使ってクラスタにすることで、簡単に数倍の高速化を図ることができる。とはいえ、自分で持つには面倒なことがいろいろと出てくる。そのデメリットと今後の見通しを議論したい。
feedback
Top