RGraph：Asynchronous graph processing based on asymmetry of remote direct memory access

Recently, the emergence of high-performance Remote Direct Memory Access (RDMA) technology brings opportunities for distributed graph processing. Due to the ability of zero-copy and kernel-bypassing, RDMA network achieves high throughput and low latency with low CPU utilization. For example, the latest RDMA network provides 200 Gbps throughput and 1 μs latency.14 If we deploy distributed graph processing systems on RDMA network, the performance of graph processing can be further improved. However, simply replacing the underlying network of the distributed graph processing system with RDMA does not fully utilize its advantage.
这段可以直接照抄

Gram: scaling graph computation to the trillions.
这篇文章中有优化RDMA的RPC相关操作，可以借鉴，主要是协调器那部分

Asymmetry of RDMA RDMA的不对称性

发送出站RDMA的开销比接收入站RDMA的开销大得多。原因是接收端的工作全部由RDMA网络接口卡(RNIC)硬件来管理，而发送端的工作则需要软件和硬件共同参与。

入站RDMA的IOPS比出站RDMA高5倍。因此，在采用单侧RDMA操作时，充分利用RDMA的不对称性相当重要
（单边读比写更重要）

文章通过RDMA来进行分布式图处理，运用RDMA单边通信，重叠通信和计算，异步执行图计算

异步图处理系统中，主顶点和镜像顶点之间的网络通信带来较高的开销。虽然高性能RDMA可以加速数据传输，但直接采用RDMA会导致性能不佳。由于主节点和镜像之间的通信遵循一对多通信模式，因此我们的关键思想是将主节点聚集到少量的 k 个节点中，以利用 RDMA 的不对称性。
对于 k 个节点，它们仅响应其余 p − k 个节点发出的 RDMA 操作。在这种情况下，具有主顶点的 k 个节点将在 RDMA 传输上花费很少的 CPU，因此可以更有效地执行计算。
因此，通过解决最小边覆盖问题将图分为两部分就显得相当重要了。
4.2讲图分边

感觉这篇文章的主要内容在讲怎么分区来区分顶点，这样可以利用RDMA的不对称性，异步根本没有怎么讲