Abstract
We developed a noble algorithm for large-scale Fock matrix generation with small local distributed memory architecture. We call it "RT parallel algorithm". The outer loop is index RT, which is determined by the combination of contracted shell indexes R and T. Indexes of inner S and U loop are determined by RS cutoff and TU cutoff. The RT parallel algorithm needs O(N) amounts of communication between a host computer and parallel processors per O(N^2) integral calculations and indexes of communicated matrix elements are also determined by the cutoff operation. Therefore we can conceal the communication time behind the calculation time. The memory amounts of processor elements do not depend on the number of basis sets since the number of cutoff-survived indexes is saturated in large-scale molecules. So we can say that the RT parallel algorithm is scalable. The RT parallel algorithm has considered the partial summing technigue which is indispensable for computing accuracy, and it can adopt another Fock matrix calculation method using the change in density matrix and therefore, the RT parallel algorithm can realize the large-scale Fock matrix calculation with small local distributed memories.