

# Shesha: Multi-head Microarchitectural Leakage Discovery in new-generation Intel Processors

## Anirban Chakraborty<sup>1</sup>, Nimish Mishra<sup>2</sup> and Debdeep Mukhopadhyay<sup>2</sup>



<sup>1</sup>Max Planck Institute for Security and Privacy, Germany <sup>2</sup>Indian Institute of Technology Kharagpur, India





instruction *i* 





instruction *i* 

data



14 August 2024



instruction *i* 

data

instruction i + 1

time





















## Problem

Finding vulnerabilities in processors is a complex and timeconsuming process

## Problem

Finding vulnerabilities in processors is a complex and timeconsuming process

### Solution

Automated process to discover new vulnerabilities

#### **Types of Bad Speculation**



- Microcode Assist
- Machine Clear

#### **Particle Swarm Optimization**

- A variant of Evolutionary Algorithm used for searching for an optimal solution in the solution space
- Advantage: Only objective function is needed



- **Exploration**: random choices that encourage the PSO for a random walk across the search space in hopes of finding newer paths to global optimums.
- **Exploitation**: greedy choices, encouraging the PSO to focus on a direct path to currently known optimums.

https://en.wikipedia.org/wiki/Particle\_swarm\_optimization#/media/File:ParticleSwarmArrowsAnimation.gif





























HPC



#### **SESHA:** Novel Leakage Variants Discovered

| Operation                      | <b>Bad Speculation Type</b>                               | Affected Generations         | Attack Type     |
|--------------------------------|-----------------------------------------------------------|------------------------------|-----------------|
| SIMD-Vector intermixing        | Self Modifying Code<br>SSE-AVX Mix                        | 12G ✓ , 11G ×<br>4X ✓ , 3X × | Leak            |
| Single-Double precision mixing | Hardware Assist<br>Memory Ordering<br>Self Modifying Code | 12G ✓ , 11G ×<br>4X ✓ , 3X × | Leak            |
| Fused Multiply-Add             | Floating Point Assist                                     | 12G ✓ , 11G ✓<br>4X ✓ , 3X ✓ | Leak, Injection |
| SSE-AES intermixing            | Floating Point Assist<br>Self Modifying Code              | 12G ✓ , 11G ✓<br>4X ✓ , 3X ✓ | Leak, Injection |

12G: Intel 12 Gen; 11G: Intel 11 Gen; 4X: 4 Gen Xeon; 3X: 3 Gen Xeon

The classic FMA family of instructions can support operations like:

- 1. fused multiply-add
- 2. fused multiply-subtract
- 3. fused multiply add/subtract interleave
- 4. signed-reversed multiply on fused multiply-add and multiply-subtract.



- Fabian Boemer and Vinodh Gopal. Fused multiple multiplication and addition-subtraction instruction set, September 21 2023. US Patent App. 17/695,554
- Jesus Corbal, Robert Valentine, Roman S Dubtsov, Nikita A Shustrov, Mark J Charney, Dennis R Bradford, Milind B Girkar, Edward T Grochowski, Thomas D Fletcher, Warren E Ferguson, et al. Systems, apparatuses, and methods for chained fused multiply add, December 4 2018. US Patent 10,146,535



- Fabian Boemer and Vinodh Gopal. Fused multiple multiplication and addition-subtraction instruction set, September 21 2023. US Patent App. 17/695,554
- Jesus Corbal, Robert Valentine, Roman S Dubtsov, Nikita A Shustrov, Mark J Charney, Dennis R Bradford, Milind B Girkar, Edward T Grochowski, Thomas D Fletcher, Warren E Ferguson, et al. Systems, apparatuses, and methods for chained fused multiply add, December 4 2018. US Patent 10,146,535



- Fabian Boemer and Vinodh Gopal. Fused multiple multiplication and addition-subtraction instruction set, September 21 2023. US Patent App. 17/695,554
- Jesus Corbal, Robert Valentine, Roman S Dubtsov, Nikita A Shustrov, Mark J Charney, Dennis R Bradford, Milind B Girkar, Edward T Grochowski, Thomas D Fletcher, Warren E Ferguson, et al. Systems, apparatuses, and methods for chained fused multiply add, December 4 2018. US Patent 10,146,535





Leakage when victim executes VFMADD213PD zmm1, zmm2, zmmword PTR [rcx].

"Almost" Montgomery Multiplication

 Uses IFMA instructions to speed up modular exponentiation 1 .set i, 0 2 ; Initialize  $X_i$  in zmm0 3 ; Initialize  $A_{curr}$  in zmm1 4 ;  $X_i = VPMADD52LUQ X_i$ ,  $A_{curr}$ ,  $A_i$ 5 VPMADD52LUQ i(%rcx), %zmm0, %zmm1 6 .rept N ; iteration bounds (i+1, z) 7 vpxord %zmm3, %zmm3, %zmm3 ; T = 08 ; T = VPMADD52LUQ ZERO,  $A_{curr}$ ,  $A_i$ 9 VPMADD52LUQ i(%rcx), %zmm1, %zmm3 10 vpslld \$1, %zmm3, %zmm3 ; T = T << 111 vpaddd %zmm1, %zmm1, %zmm3 ;  $X_i = X_i + T$ 12 .set i, i+1 13 .endr

- Uses IFMA instructions to speed up modular exponentiation
- Initialize  $X_i = VPMADD52LUQ(X_i, A_{curr}, A_i)$

| 1  | .set i, O                                        |
|----|--------------------------------------------------|
| 2  | ; Initialize $X_i$ in zmm0                       |
| 3  | ; Initialize A <sub>curr</sub> in zmm1           |
| 4  | ; $X_i = VPMADD52LUQ X_i$ , $A_{curr}$ , $A_i$   |
| 5  | <pre>VPMADD52LUQ i(%rcx), %zmm0, %zmm1</pre>     |
| 6  | <pre>.rept N ; iteration bounds (i+1, z)</pre>   |
| 7  | vpxord %zmm3, %zmm3, %zmm3 ; $T = 0$             |
| 8  | ; $T = VPMADD52LUQ ZERO$ , $A_{curr}$ , $A_i$    |
| 9  | <pre>VPMADD52LUQ i(%rcx), %zmm1, %zmm3</pre>     |
| 10 | vpslld \$1, %zmm3, %zmm3 ; T = T << 1            |
| 11 | vpaddd %zmm1, %zmm1, %zmm3 ; $X_i$ = $X_i$ + $T$ |
| 12 | .set i, i+1                                      |
| 13 | .endr                                            |

- Uses IFMA instructions to speed up modular exponentiation
- Initialize  $X_i = VPMADD52LUQ(X_i, A_{curr}, A_i)$
- Operate in a loop
  - $\circ T = VPMADD52LUQ(0, A_{curr}, A_i)$

| 1  | .set i, O                                        |
|----|--------------------------------------------------|
| 2  | ; Initialize $X_i$ in zmm0                       |
| 3  | ; Initialize $A_{curr}$ in zmm1                  |
| 4  | ; $X_i = VPMADD52LUQ X_i$ , $A_{curr}$ , $A_i$   |
| 5  | <pre>VPMADD52LUQ i(%rcx), %zmm0, %zmm1</pre>     |
| 6  | .rept N ; iteration bounds (i+1, z)              |
| 7  | vpxord %zmm3, %zmm3, %zmm3 ; $T = 0$             |
| 8  | ; $T = VPMADD52LUQ ZERO, A_{curr}, A_i$          |
| 9  | <pre>VPMADD52LUQ i(%rcx), %zmm1, %zmm3</pre>     |
| 10 | vpslld \$1, %zmm3, %zmm3 ; T = T << 1            |
| 11 | vpaddd %zmm1, %zmm1, %zmm3 ; $X_i$ = $X_i$ + $T$ |
| 12 | .set i, i+1                                      |
| 13 | .endr                                            |

- Uses IFMA instructions to speed up modular exponentiation
- Initialize  $X_i = VPMADD52LUQ(X_i, A_{curr}, A_i)$
- Operate in a loop
  - $\circ T = VPMADD52LUQ(0, A_{curr}, A_i)$
- leaked value is the multiplicand  $A_i$ :  $i \in \{i + 1, i + 2, ..., z\}$

| 1  | .set i, O                                        |
|----|--------------------------------------------------|
|    |                                                  |
| 2  | ; Initialize $X_i$ in zmm0                       |
| 3  | ; Initialize $A_{curr}$ in zmm1                  |
| 4  | ; $X_i = VPMADD52LUQ X_i$ , $A_{curr}$ , $A_i$   |
| 5  | <pre>VPMADD52LUQ i(%rcx), %zmm0, %zmm1</pre>     |
| 6  | .rept N ; iteration bounds (i+1, z)              |
| 7  | vpxord %zmm3, %zmm3, %zmm3 ; $T = 0$             |
| 8  | ; $T = VPMADD52LUQ ZERO, A_{curr}, A_i$          |
| 9  | <pre>VPMADD52LUQ i(%rcx), %zmm1, %zmm3</pre>     |
| 10 | vpslld \$1, %zmm3, %zmm3 ; T = T << 1            |
| 11 | vpaddd %zmm1, %zmm1, %zmm3 ; $X_i$ = $X_i$ + $T$ |
| 12 | .set i, i+1                                      |
| 13 | .endr                                            |

- Uses IFMA instructions to speed up modular exponentiation
- Initialize  $X_i = VPMADD52LUQ(X_i, A_{curr}, A_i)$
- Operate in a loop
  - $\circ T = VPMADD52LUQ(0, A_{curr}, A_i)$
- leaked value is the multiplicand  $A_i$ :  $i \in \{i + 1, i + 2, ..., z\}$
- Similar leakage has been found in IFMA based implementations of Cumulative Supersingular Isogeny Diffie-Hellman (CSIDH) and Supersingular Isogeny Key Encapsulation (SIKE)

| 1  | .set i, O                                        |
|----|--------------------------------------------------|
| 2  | ; Initialize $X_i$ in zmm0                       |
| 3  | ; Initialize $A_{curr}$ in zmm1                  |
| 4  | ; $X_i = VPMADD52LUQ X_i$ , $A_{curr}$ , $A_i$   |
| 5  | <pre>VPMADD52LUQ i(%rcx), %zmm0, %zmm1</pre>     |
| 6  | .rept N ; iteration bounds (i+1, z)              |
| 7  | vpxord %zmm3, %zmm3, %zmm3 ; $T = 0$             |
| 8  | ; $T = VPMADD52LUQ ZERO, A_{curr}, A_i$          |
| 9  | <pre>VPMADD52LUQ i(%rcx), %zmm1, %zmm3</pre>     |
| 10 | vpslld \$1, %zmm3, %zmm3 ; T = T << 1            |
| 11 | vpaddd %zmm1, %zmm1, %zmm3 ; $X_i$ = $X_i$ + $T$ |
| 12 | .set i, i+1                                      |
| 13 | .endr                                            |

#### Conclusion

- Automation process to discover new vulnerabilities in x86 processors
- Evolutionary Algorithms that are efficient in search optimizations can be an efficient tool
- We establish the concept of equivalent classes that represent disjointedly fragmented sub-

spaces of bad speculation

- **Shesha** is a vulnerability detection tool based on PSO principles
- Data from FMA execution engine is speculatively forwarded to AVX execution engine
- Use of IFMA instructions in crypto applications make them vulnerable

