# CrossMapping: Harmonizing Memory Consistency in Cross-ISA Binary Translation

**Chen Gao**, Xiangwei Meng, Wei Li, Jinhui Lai, Yiran Zhang, and Fengyuan Ren



#### **Dynamic Binary Translation**



### **Dynamic Binary Translation**

• DBT technology can emulate guest binary programs on the host by translating codes at runtime.



### **Memory Model**

• The memory model describes the behavior of concurrent primitives on shared memory.

### **Memory Model**

• The memory model describes the behavior of concurrent primitives on shared memory.

| Memory<br>Access          | x86          | ARMv8        |
|---------------------------|--------------|--------------|
| Load $\rightarrow$ Load   | Ordered      | Out-of-order |
| Load $\rightarrow$ Store  | Ordered      | Out-of-order |
| Store $\rightarrow$ Load  | Out-of-order | Out-of-order |
| Store $\rightarrow$ Store | Ordered      | Out-of-order |

• Message passing test in x86

Initially X=0, Y=0 X = 1; a = Y;Y = 1; b = X;

a = 1, b = 0 Forbidden

• Message passing test in x86

Initially X=0, Y=0 X = 1; a = Y;Y = 1; b = X;

a = 1, b = 0 Forbidden



Message passing test in ARMv8

Initially X=0, Y=0X = 1;a = Y;Y = 1;b = X;

a = 1, b = 0 Observable

• Message passing test in x86

Initially X=0, Y=0 X = 1; a = Y;Y = 1; b = X;

a = 1, b = 0 Forbidden

Insert Barriers to Ensure Memory Ordering

Message passing test in ARMv8

Initially X=0, Y=0 X = 1; a = Y;fence st-st Y = 1; b = X;

a = 1, b = 0 Forbidden

• Message passing test in x86

Initially X=0, Y=0 X = 1; a = Y;Y = 1; b = X;

a = 1, b = 0 Forbidden



Message passing test in ARMv8

Initially X=0, Y=0 X = 1; a = Y;fence st-st fence Id-Id Y = 1; b = X;

a = 1, b = 0 Forbidden



Table: QEMU mapping schemes (x86 to ARMv8)

| x86    |               | TCG IR  |               | ARMv8         |
|--------|---------------|---------|---------------|---------------|
| Load   | $\rightarrow$ | Fmr; Id | $\rightarrow$ | DMB ld; LDR   |
| Store  | $\rightarrow$ | Fmw; st | $\rightarrow$ | DMB full; STR |
| RMW    | $\rightarrow$ | call    | $\rightarrow$ | BLR; RMW; RET |
| MFENCE | $\rightarrow$ | Fsc     | $\rightarrow$ | DMB full      |



Table: QEMU mapping schemes (x86 to ARMv8)

| x86    |               | TCG IR  |               | ARMv8         |
|--------|---------------|---------|---------------|---------------|
| Load   | $\rightarrow$ | Fmr; ld | $\rightarrow$ | DMB ld; LDR   |
| Store  | $\rightarrow$ | Fmw; st | $\rightarrow$ | DMB full; STR |
| RMW    | $\rightarrow$ | call    | $\rightarrow$ | BLR; RMW; RET |
| MFENCE | $\rightarrow$ | Fsc     | $\rightarrow$ | DMB full      |



Table: QEMU mapping schemes (x86 to ARMv8)

| x86    |               | TCG IR  |               | ARMv8         |
|--------|---------------|---------|---------------|---------------|
| Load   | $\rightarrow$ | Fmr; Id | $\rightarrow$ | DMB ld; LDR   |
| Store  | $\rightarrow$ | Fmw; st | $\rightarrow$ | DMB full; STR |
| RMW    | $\rightarrow$ | call    | $\rightarrow$ | BLR; RMW; RET |
| MFENCE | $\rightarrow$ | Fsc     | $\rightarrow$ | DMB full      |

#### • QEMU

#### Fetch-And-Add litmus test

Initially X=0, Y=0Initially X=0, Y=0X = 1;
$$a = Y;$$
DMB full;Y = 1; $if(a == 1)$  $X = 1;$ FAA(X,1) //LOCK XADDDMB full; $a = Y;$ Y = 1; $Y = 1;$  $FAA(X,1) //LDAXR-STLXR$  $a = 1, X = 1$  Forbidden $a = 1, X = 1$  Observable

x86 Pseudo-Assembly

ARMv8 Pseudo-Assembly

• Risotto<sup>[ASPLOS'23]</sup> and Lasagne<sup>[PLDI'22]</sup>

➤Correctly handle RMW instructions

 $\blacktriangleright$  Designed specifically for x86 to ARMv8

- ArMOR [ISCA'15]
  - ➢More efficient
  - ≻Not designed for Cross-ISA DBT systems.

### **Overview of CrossMapping**



• Definition of specification table

Specification table of ARMv8 memory orderings

| 2 <sup>nd</sup> Ins.<br>1 <sup>st</sup> Ins. Ord. | SA<br>Ins.   | DA<br>ld     | DA<br>st     | DA<br>ld-aq  | DA<br>ld-PC  | DA<br>ld-rl  |
|---------------------------------------------------|--------------|--------------|--------------|--------------|--------------|--------------|
| ld                                                | $\checkmark$ | √ dep        | √ dep        | _            | _            | $\checkmark$ |
| st                                                | $\checkmark$ |              |              |              |              | $\checkmark$ |
| ld-aq                                             | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ |
| ld-PC                                             | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ |
| st-rl                                             | $\checkmark$ | _            | _            | _            | _            | $\checkmark$ |





• Refinement

Specification table of DMB ld

Refine

Refined specification table for DMB ld

| 2 <sup>nd</sup> Ins.<br>1 <sup>st</sup> Ins. Ord. | SA<br>Ins.   | DA<br>ld     | DA<br>st     | DA<br>ld-aq  | DA<br>ld-PC  | DA<br>ld-rl  |
|---------------------------------------------------|--------------|--------------|--------------|--------------|--------------|--------------|
| ld                                                | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ |
| st                                                | $\checkmark$ | _            | _            | _            | _            | $\checkmark$ |
| ld-aq                                             | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ |
| ld-PC                                             | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ | $\checkmark$ |
| st-rl                                             | $\checkmark$ | _            | _            | $\checkmark$ | _            | $\checkmark$ |

Comparison



• Union

|  |        | lns. 1       | Ins. 2       |    |        | lns. 1       | Ins. 2       |   |        | lns. 1       | Ins. 2       |
|--|--------|--------------|--------------|----|--------|--------------|--------------|---|--------|--------------|--------------|
|  | lns. 1 | $\checkmark$ | $\checkmark$ | IJ | lns. 1 | $\checkmark$ | $\checkmark$ | = | Ins. 1 | $\checkmark$ | $\checkmark$ |
|  | Ins. 2 | $\checkmark$ |              | 0  | Ins. 2 |              |              |   | lns. 2 | $\checkmark$ |              |

• Subtraction

|        | lns. 1       | Ins. 2       |            | lns. 1       | Ins. 2       |   |        | lns. 1       | Ins. 2 |
|--------|--------------|--------------|------------|--------------|--------------|---|--------|--------------|--------|
| lns. 1 | $\checkmark$ | $\checkmark$ | <br>lns. 1 | $\checkmark$ | $\checkmark$ | = | lns. 1 |              |        |
| Ins. 2 | $\checkmark$ |              | Ins. 2     |              |              |   | lns. 2 | $\checkmark$ |        |

• An example of FSM



#### • FSM Generator

#### ➢ Fences

➢Single-instruction RMWs

≻End of the basic block

| A  | lgorithm 1: FSM Generation Algorithm              |
|----|---------------------------------------------------|
| 1  | Function GetNextState(state, op):                 |
| 2  | requiredMo = guestMo-hostMo;                      |
| 3  | if IsFence(op) then                               |
| 4  | ordToEnforce = op;                                |
| 5  | fence = InsertIRFence(ordToEnforce);              |
| 6  | $l newOrd = \emptyset;$                           |
| 7  | else if IsSingleInstRMW(op) then                  |
| 8  | if GusetRMWActFullBarrier(op) then                |
| 9  | newOrd = $\emptyset$ ;                            |
| 10 | else                                              |
| 11 | ld, st = Split(op);                               |
| 12 | ldOrdToEnforce = KeepCol(state, ld);              |
| 13 | <pre>fence = InsertIRFence(ldordToEnforce);</pre> |
| 14 | $tmpState = (state - fence) \cup$                 |
|    | KeepRow(requiredMo, 1d);                          |
| 15 | newOrd = KeepRow(requiredMo, st);                 |
|    |                                                   |

#### • FSM Generator

#### ➢ Fences

#### ➢Single-instruction RMWs

≻End of the basic block

| 3      | <b>if</b> IsFence(op) <b>then</b>                                      |
|--------|------------------------------------------------------------------------|
| 4      | ordToEnforce = op;                                                     |
| 5      | <pre>fence = InsertIRFence(ordToEnforce);</pre>                        |
| 6      | newOrd = $\emptyset$ ;                                                 |
| 7<br>8 | else if IsSingleInstRMW(op) then<br>if GusetRMWActFullBarrier(op) then |
| 9      | low Ord = 0;                                                           |
| 10     | else                                                                   |
| 11     | ld, st = Split(op);                                                    |
| 12     | ldOrdToEnforce = KeepCol(state, ld);                                   |
| 13     | <pre>fence = InsertIRFence(ldordToEnforce);</pre>                      |
| 14     | tmpState = (state-fence) $\cup$                                        |
|        | KeepRow(requiredMo, ld);                                               |
| 15     | newOrd = KeepRow(requiredMo, st);                                      |
| 16     | else if IsEndOfBasicBlock(op) and                                      |
|        | isAfterRMW(state) then                                                 |
| 17     | <pre>fence = InsertIRFence(fullfence-state);</pre>                     |
| 18     | newOrd = $\emptyset$ ;                                                 |
| 19     | else                                                                   |
|        |                                                                        |

#### • FSM Generator

Fences

➢Single-instruction RMWs

➤End of the basic block

| 8  | if GusetRMWActFullBarrier(op) then                 |
|----|----------------------------------------------------|
| 9  | newOrd = $\emptyset$ ;                             |
| 10 | else                                               |
| 11 | ld, st = Split(op);                                |
| 12 | ldOrdToEnforce = KeepCol(state, ld);               |
| 13 | <pre>fence = InsertIRFence(ldordToEnforce);</pre>  |
| 14 | $tmpState = (state - fence) \cup$                  |
|    | KeepRow(requiredMo, ld);                           |
| 15 | newOrd = KeepRow(requiredMo, st);                  |
|    |                                                    |
| 16 | else if IsEndOfBasicBlock(op) and                  |
|    | isAfterRMW(state) then                             |
| 17 | <pre>fence = InsertIRFence(fullfence-state);</pre> |
| 18 | $  lewOrd = \emptyset; $                           |
| 19 | else                                               |
| 20 | ordToEnforce = KeepCol(state, op);                 |
| 21 | InsertIRFence(ordToEnforce);                       |
| 22 | newOrd = KeepRow(requiredMo, op);                  |
| 23 | novtStata - (stata fanca)     novyOrd:             |
| 23 | nextState = (state-fence) $\cup$ newOrd;           |

#### • FSM Generator

➢ Fences

➢Single-instruction RMWs

≻End of the basic block

| 8  | if GusetRMWActFullBarrier(op) then                |
|----|---------------------------------------------------|
| 9  | newOrd = $\emptyset$ ;                            |
| 10 | else                                              |
| 11 | ld, st = Split(op);                               |
| 12 | ldOrdToEnforce = KeepCol(state, ld);              |
| 13 | <pre>fence = InsertIRFence(ldordToEnforce);</pre> |
| 14 | $tmpState = (state - fence) \cup$                 |
|    | KeepRow(requiredMo, 1d);                          |
| 15 | newOrd = KeepRow(requiredMo, st);                 |
|    |                                                   |
| 16 | else if IsEndOfBasicBlock(op) and                 |
|    | isAfterRMW(state) <b>then</b>                     |
| 17 | fence = InsertIRFence(fullfence-state);           |
| 18 | newOrd = $\emptyset$ ;                            |
| 19 | else                                              |
| 20 | ordToEnforce = KeepCol(state, op);                |
| 21 | InsertIRFence(ordToEnforce);                      |
| 22 | newOrd = KeepRow(requiredMo, op);                 |
| 23 | nextState = (state-fence) $\cup$ newOrd;          |
| 24 | return nextState;                                 |

### **IR-to-Host Mapping**

• Fences Comparator

➢For each IR fence, the comparator identifies the weakest host fence to satisfy all the enforced orderings of IR fence.

### Case Study: x86 to ARMv8 Mapping

• FSM of mapping from x86 to TCG IR



• TCG IR to ARMv8 fence mapping scheme

| TCG IR          |               | ARMv8    |
|-----------------|---------------|----------|
| Frr/Frw         | $\rightarrow$ | DMB ld   |
| Fww             | $\rightarrow$ | DMB st   |
| Fwr/Fmw/Fwm/Fsc | $\rightarrow$ | DMB full |

#### **Evaluation**

• Strong-on-Weak Architecture Emulation



#### **Evaluation**

• Weak-on-Strong Architecture Emulation



Thanks!