2025 Volume 18 Pages 10-18
High-level synthesis (HLS) reduces design time of domain-specific accelerators from loop nests. Usually, naive usage of HLS leads to accelerators with insufficient performance, so very time-consuming manual optimizations of input programs are necessary in such cases. Scalar replacement is a promising automatic memory access optimization that removes redundant memory accesses. However, it cannot handle loops with multiple write accesses to the same array, which poses a severe limitation of its applicability. In addition, it is difficult to automatically apply scalar replacement to memory accesses with non-constant reuse distances. In this paper, we propose a novel memory access optimization technique that overcomes these existing limitations. Experimental results show that the proposed method achieves 2.14x performance gain on average with decreased total gate count of 5% for the benchmark programs which the state-of-the-art memory optimization techniques cannot optimize.