2026 年 E109.B 巻 1 号 p. 38-49
In large-scale RDMA networks, packet drops become inevitable and frequent due to congestion or failure. Since the original Go-Back-N (GBN) loss recovery implementation on RDMA NICs exhibits noticeable performance degradation even at low packet loss rates, adopting selective retransmission on RDMA NICs has been proven efficient. However, existing efforts have primarily focused on the efficiency of selective retransmission in single-path transmissions, neglecting challenges posed by multi-path scenarios, such as the ambiguity between packet losses and out-of-order arrivals, the ignorance and incompetence of retransmit path selection, etc. These oversights may lead to long retransmission time or even retransmission failure, potentially decreasing the throughput. Therefore, to enable fast and reliable loss recovery in multi-path RDMA transport, we propose OrderRE, an out-of-order degree based redundant retransmission scheme. In OrderRE, based on the out-of-order degree of arriving packets, the receiver can recognize binary path states (fast or slow) and distinguish packet losses from out-of-order arrivals. Further, based on a multicast-based mechanism implemented via programmable switches, the retransmitted packets can be duplicated and redirected to a group of fast paths identified by the receiver, achieving path-restricted redundant retransmission without path state maintenance on RDMA NICs. Finally, we integrate OrderRE into existing state-of-the-art multi-path RDMA protocol MP-RDMA and conduct various evaluations. It can be shown that when compared to naive selective retransmission, OrderRE has achieved an average of 30% improvement in retransmission speed and a more than 90% reduction in retransmission failures. Consequently, OrderRE can achieve approximately 2× the goodput of the naive scheme under the minimum bitmap size to sustain goodput, when the packet loss rate is 1%.