• Medientyp: E-Artikel
  • Titel: XStore : Fast RDMA-Based Ordered Key-Value Store Using Remote Learned Cache
  • Beteiligte: Wei, Xingda; Chen, Rong; Chen, Haibo; Zang, Binyu
  • Erschienen: Association for Computing Machinery (ACM), 2021
  • Erschienen in: ACM Transactions on Storage
  • Sprache: Englisch
  • DOI: 10.1145/3468520
  • ISSN: 1553-3077; 1553-3093
  • Entstehung:
  • Anmerkungen:
  • Beschreibung: <jats:p> <jats:bold>RDMA</jats:bold> ( <jats:bold>Remote Direct Memory Access</jats:bold> ) has gained considerable interests in network-attached in-memory key-value stores. However, traversing the remote tree-based index in ordered key-value stores with RDMA becomes a critical obstacle, causing an order-of-magnitude slowdown and limited scalability due to multiple round trips. Using index cache with conventional wisdom—caching partial data and traversing them locally—usually leads to limited effect because of unavoidable capacity misses, massive random accesses, and costly cache invalidations. </jats:p> <jats:p> We argue that the <jats:bold>machine learning</jats:bold> (ML) model is a perfect cache structure for the tree-based index, termed <jats:italic>learned cache</jats:italic> . Based on it, we design and implement <jats:sc>XStore</jats:sc> , an RDMA-based ordered key-value store with a new hybrid architecture that retains a tree-based index at the server to perform dynamic workloads (e.g., inserts) and leverages a learned cache at the client to perform static workloads (e.g., gets and scans). The key idea is to decouple ML model retraining from index updating by maintaining a layer of indirection from logical to actual positions of key-value pairs. It allows a stale learned cache to continue predicting a correct position for a lookup key. <jats:sc>XStore</jats:sc> ensures correctness using a validation mechanism with a fallback path and further uses speculative execution to minimize the cost of cache misses. Evaluations with YCSB benchmarks and production workloads show that a single <jats:sc>XStore</jats:sc> server can achieve over 80 million read-only requests per second. This number outperforms state-of-the-art RDMA-based ordered key-value stores (namely, DrTM-Tree, Cell, and eRPC+Masstree) by up to 5.9× (from 3.7×). For workloads with inserts, <jats:sc>XStore</jats:sc> still provides up to 3.5× (from 2.7×) throughput speedup, achieving 53M reqs/s. The learned cache can also reduce client-side memory usage and further provides an efficient memory-performance tradeoff, e.g., saving 99% memory at the cost of 20% peak throughput. </jats:p>