# NyxCache: Flexible and Efficient Multi-Tenant Persistent Memory Caching

Kan Wu, Kaiwei Tu, Yuvraj Patel, Rathijit Sen, Kwanghyun Park, Andrea Arpaci-Dusseau and Remzi Arpaci-Dusseau



#### **In-Memory Key-Value Caches are Crucial**



#### **In-Memory Key-Value Caches are Crucial**





### **In-Memory Key-Value Caches are Crucial**



# A Cache Server is Usually Multi-Tenant



**Consolidated** instances

## A Cache Server is Usually Multi-Tenant



Consolidated instances Contention -> regulation required

# A Cache Server is Usually Multi-Tenant







Consolidated instances Contention -> regulation required Example sharing policies

- resource limit based on price tier,
- QoS
- proportional sharing, ...

# **Persistent Memory for In-Memory KV Caches**

#### **Persistent Memory (PMEM)**

• Intel Optane DC PMM (byte-addressable, memory bus, comparable performance to DRAM)



# **Persistent Memory for In-Memory KV Caches**

#### **Persistent Memory (PMEM)**

• Intel Optane DC PMM (byte-addressable, memory bus, comparable performance to DRAM)

#### **Appealing building blocks for in-mem KV caches**

- Large capacity -> high hit rate
- Low cost per byte -> cheap, scale
- Energy-efficiency -> operational cost
- ...









Lessons

• We must regulate PMEM access; small PMEM traffic can have a big effect



- We must regulate PMEM access; small PMEM traffic can have a big effect
- We need new PMEM sharing mechanisms; existing DRAM/storage mechanisms can be ineffective due to PMEM's unique characteristics

- We must regulate PMEM access; small PMEM traffic can have a big effect
- We need new PMEM sharing mechanisms; existing DRAM/storage mechanisms can be ineffective due to PMEM's unique characteristics
  - **Example:** memory bandwidth limiting for "limiting impact to others"
  - Setup: Cache A and B (B limit: 1GB/s PMEM traffic)

- We must regulate PMEM access; small PMEM traffic can have a big effect
- We need new PMEM sharing mechanisms; existing DRAM/storage mechanisms can be ineffective due to PMEM's unique characteristics
  - Example: memory bandwidth limiting for "limiting impact to others"
  - Setup: Cache A and B (B limit: 1GB/s PMEM traffic)



- We must regulate PMEM access; small PMEM traffic can have a big effect
- We need new PMEM sharing mechanisms; existing DRAM/storage mechanisms can be ineffective due to PMEM's unique characteristics
  - Example: memory bandwidth limiting for "limiting impact to others"
  - Setup: Cache A and B (B limit: 1GB/s PMEM traffic)
  - Memory bandwidth limiting is ineffective due to PMEM 256B internal access granularity



# **Goal:** Design New PMEM Sharing Mechanisms

# **Goal:** Design New PMEM Sharing Mechanisms









#### Quality of Service (QoS)

- Latency-critical clients have latency guarantee
- Best-effort clients





#### **Quality of Service (QoS)**

- Latency-critical clients have latency guarantee
- Best-effort clients







#### **Quality of Service (QoS)**

- Latency-critical clients have latency guarantee
- Best-effort clients

















## Contributions

#### **Re-evaluate Key Mechanisms**

• Analyze problems with existing mechanisms on PMEM

**NyxCache:** a flexible access regulation framework for any sharing goal

- Design new software mechanisms for PMEM sharing
- Revise four policy implementations based on new mechanisms

# Contributions

#### **Re-evaluate Key Mechanisms**

• Analyze problems with existing mechanisms on PMEM

**NyxCache:** a flexible access regulation framework for any sharing goal

- Design new software mechanisms for PMEM sharing
- **Revise** four **policy implementations** based on new mechanisms

This talk:

Interference

Analysis

**QoS Policy** 

Use Case: Quality-of-Service policy



Use Case: Quality-of-Service policy



Use Case: Quality-of-Service policy



Use Case: Quality-of-Service policy



Use Case: Quality-of-Service policy

- Latency-critical clients (with tail latency guarantee) + Best-effort clients
- Question: Who should we throttle? interference analysis to find out the most interfering client -> quick rescue and high utilization



**DRAM method:** use clients' BW as indicator; higher BW -> more interference **Problems:** PMEM Bandwidth is not a good indicator of interference

**DRAM method:** use clients' BW as indicator; higher BW -> more interference **Problems:** PMEM Bandwidth is not a good indicator of interference

**DRAM method:** use clients' BW as indicator; higher BW -> more interference **Problems:** PMEM Bandwidth is not a good indicator of interference



**DRAM method:** use clients' BW as indicator; higher BW -> more interference **Problems:** PMEM Bandwidth is not a good indicator of interference



**DRAM method:** use clients' BW as indicator; higher BW -> more interference **Problems:** PMEM Bandwidth is not a good indicator of interference



**DRAM method:** use clients' BW as indicator; higher BW -> more interference **Problems:** PMEM Bandwidth is not a good indicator of interference

- Problem 1: write interference > read interference
- Problem 2: small accesses (<256B) interference > large access, with the same BW

e.g., 1GB/s 64B writes cause 2x the interference as 1GB/s 256B writes

**DRAM method:** use clients' BW as indicator; higher BW -> more interference **Problems:** PMEM Bandwidth is not a good indicator of interference

- Problem 1: write interference > read interference
- Problem 2: small accesses (<256B) interference > large access, with the same BW

e.g., 1GB/s 64B writes cause 2x the interference as 1GB/s 256B writes

We need new high-fidelity interference analysis for PMEM sharing

Goal: Answer who is interfering the most with a given client

- No special hardware software solution
- Minimal device assumptions treat devices as black box

**Goal:** Answer who is interfering the most with a given client **Solution:** runtime micro-, controlled-experiments

**Goal:** Answer who is interfering the most with a given client **Solution:** runtime micro-, controlled-experiments

• Setup: cache A, B, C; who is interfering A the most?

**Goal:** Answer who is interfering the most with a given client **Solution:** runtime micro-, controlled-experiments

• **Setup**: cache A, B, C; who is interfering A the most?

Current State A Performance: L



**Goal:** Answer who is interfering the most with a given client **Solution:** runtime micro-, controlled-experiments

• **Setup**: cache A, B, C; who is interfering A the most?

Current State A Performance: L





**Goal:** Answer who is interfering the most with a given client **Solution:** runtime micro-, controlled-experiments

• **Setup**: cache A, B, C; who is interfering A the most?

Current State A Performance: L





# **Evaluation: NyxCache – QoS**

What's the benefit of NyxCache interference analysis mechanism?

- Setup: cache A, B, C
  - Cache A: latency-critical cache (fixed)
  - Cache B: read-dominant best-effort cache (fixed)
  - Cache C: write-dominant best-effort cache (dynamic)















# NyxCache Summary

PMEM sharing necessitates evolving software/hardware stack. Our contributions:

- **Define** what are important sharing mechanisms (the subtrate)
- Analyze **problems** with existing mechanisms on **PMEM**
- **NyxCache** design **new** software PMEM sharing **mechanisms**
- NyxCache revise policy implementations based on new mechanisms



#### **Future Directions**

Hardware Redesigns and Hardware/Software Codesigns for PMEM Sharing

Contact: kanwu@cs.wisc.edu Code: cs.wisc.edu/~kanwu