## Power and EM SCA Resilience in 65nm AES-256 Exploiting Clock-Slew Dependent Variability in CMOS Digital Circuits

Archisman Ghosh<sup>1</sup>, Md. Abdur Rahman<sup>1</sup>, Debayan Das<sup>1,2</sup>, Santosh Ghosh<sup>2</sup>, Shreyas Sen<sup>1</sup> <sup>1</sup>Purdue University, IN, USA, <sup>2</sup>Intel Corp. Portland, OR, USA,

Side channel analysis (SCA) is a low time-complexity technique of extracting secret information from a cryptographic IC, which calls for low-overhead generic resilience techniques. While architectural and logical countermeasures [1] are explored widely, recently generic circuit-level countermeasures (e.g. voltage regulators [2-4], power balancing [5] or through a switched capacitor current equalizer [6], using an on-chip machine learning model [7, 8], and signature attenuation [9, 10]) have gained prominence due to low overheads and being architecture agnostic. Typical digital cryptographic core has two controllable ports, i.e., supply and clock. Most of the circuitlevel/physical layer countermeasures have primarily utilized the power port to reduce the side-channel leakage signal-to-noise ratio (SNR). Related to the clocking port, well-studied system-level clock frequency randomization techniques have been deemed ineffective with post-processing. However, the impact of circuit-level changes in the clocking circuitry and its device-circuit-system level interactions with inherent properties of digital circuits and its impact on SCA leakage remains unexplored. Another key requirement for the countermeasure is to make it fully synthesizable for scalability across different technology nodes. This work, for the first time, exploits the inherent variability of CMOS digital circuits by providing a controlled slewed clock and demonstrates an extremely low-overhead technique for immunity against power and EM SCA, which can be easily combined with any of the supply port countermeasures for multiplicative effect on SCA resilience.

While CMOS digital circuits are abstracted logically in terms of their functionality guaranteed up to the fmax, their power consumption profile is a strong function of circuit-level changes in the clocking network. Controlled clock-slew results in the following effects: 1) register internal clock slew propagation leading to 2 important variability in digital circuits, namely 2) duty cycle distortion, and 3) slew-dependent latch delay variability (hence flip-flop (FF) toggle point variability). In presence of the controlled clock slew, 2 important process dependent factors, namely 4) location-dependent variability in Elmore delay and 5) intra-die process variation (device mismatch) also get amplified increasing 1-3 which increases the SCA security further. Additionally, system-level clock randomization is now more effective in presence of the slewed clock as the post-processing techniques become ineffective when the sharp edges are absent. The effect of the clock slew is gradually reduced as it passes through internal buffer stages of FF (Fig. 1, Effect 1), but not fully suppressed (bottom right). Slews S1, S2 present at the master and slave stages of FF determine its toggle points (Fig. 1, bottom), making the power profile a complex function of the input and slew, instead of just the input. Note that S1 and S2 are internal to register and hence do not drive clock of different parts of the designs ensuring functional correctness. Duty cycle gets distorted (Fig. 2, Effect 2) based on the rise time (t<sub>r</sub>), especially for high input slew (low slope) cases with less than full-scale clock swing, which in turn changes the time between different leakage points in the power supply, making it harder to attack. Providing slewed clock to FF makes latching time (Effect 3) a function of  $t_r$  as well as the input since  $t_{latch HL} \neq t_{latch LH}$ . Slewed clock affects the Elmore delay (Effect 4) which further increases cumulative slewness at clocks of different FFs, enhancing SCA security. Combinational delay does not vary with slew (Fig. 2, bottom left) ensuring no set-up violation in the design. Large slew (very small slope) could lead to hold time increase; however, clock-q delay increases with respect to slew as well ensuring significant positive slack (Fig. 2, bottom middle). Due to this, we observe no functional failure in the IC. Performance can be impacted only by <1% (up to  $C_L {=} 0.87 pF)$  to  ${<} 10\%$  ( $C_L {=} 1 pF)$  in terms of  $f_{\text{max}},$  if clock buffer is not able to drive the clock port and does not reach toggle point due to extra capacitance in this design. However, this is not fundamental to this SCA resilient technique. We can mitigate the performance degradation by overdesigning clock buffer or by increasing the driving capability of the same buffer at the cost of power overhead. This performance trade-off benefits in SCA resilience (<1% fmax degradation, but 1800x improvement in MTD). Fig. 2 (bottom right) shows ~10x increase in variance of power trace of clock-slewed AES

design (red) with respect to unprotected (green) near 14<sup>th</sup> cycle posedge (empirically highest leakage point) which confirms the basis of SCA resilience.





Die micrograph.

circuit as shown in Fig. 3. HD between 13th and 14th rounds on state register is used as the attack point. Trigger is used for the trace alignment. Slewed clocking buffer provides the tunability to load the clock with different capacitors ranging from 100fF to 5pF (Fig. 3). We utilize a LFSR (seeded by an external TRNG for high randomness) which randomly selects between capacitors for enhanced security (Fig. 3, top right). A tunable ring oscillator is designed for the clock randomization (Fig. 3, bottom left). The number of stages is determined by the LFSR output to provide clock frequency randomization. Coarse frequency change is performed using a 3stage frequency divider (Fig. 3, bottom left). Time domain waveform of the 14th cycle of AES256 for a set of 6 different plaintexts are shown for different configurations (Fig. 4, top). Clearly, power profile at time point t, (P(t)) is data-dependent (x(t)) which is the basis of SCA (Hamming Distance (HD) model-based SCA detects dP/dx) in absence of slewed clock. Due to clock slew, power consumption (P'(t) = f(x(t), t - D(s))) is dependent on slew (s)-related delay D(s), which is a non-linear function of the input data (x(t)) as shown in Fig 4. Significant SCA resilience could be achieved if  $dP'/dx \ll dP'/dD$ . Clock randomization (CR) is combined with the clock slew for enhanced security. Prior CR-based countermeasures with sharp clock were deemed ineffective as the traces can be aligned in timedomain using post-processing or can be bandpass filtered for frequency-domain attack. With slewed clock, 1) power profile gets smeared in time-domain creating Inter-Symbol Interference (ISI) and 2) determining randomized clock edge variations is hard to distinguish from power profile with slewed randomized clock, making time-realignment attacks extremely difficult. Leaky frequency components (~900MHz is most leaky as shown in measurements, Fig. 4, 5) could not be determined with 20M traces rendering frequency domain post-processing-based attacks on combined clock randomized slewed AES (CRSL-AES) extremely difficult. Entire circuit is completely synthesizable and fully placed and routed using commercial tools. Capacitors are implemented using DCAP cell from standard library.

Correlational power analysis (CPA) attack is explored both in case of regular clock, slewed clock, and for the combined countermeasure with clock randomization. AES with slewed clock (SL-AES) provides >100x enhanced SCA security (MTD = 1.2M) with respect to the unprotected core (MTD = 11K). CR-AES alone provides SCA security against CPA attack with an MTD > 270K, however, correct key is revealed within 70K traces utilizing a bandpass filter. Time and frequency domain CPA attack performed on CRSL-AES could not reveal the correct key byte with >20M traces (Fig. 5). Frequency domain CPA is performed throughout the entire frequency spectrum of the power traces which confirms that leaky component cannot be determined in entire spectrum with 20M traces. The cumulative effect (CPA MTD>20M (1,800x) for CRSL-AES) is even more than multiplicative effect of the individual techniques (CPA MTD for SL-AES: 1.2M (109x), CR-AES: 70K (6.4x)). Correlational EM Analysis (CEMA) attack fails to extract correct key even after 20M traces. There is no leakage observed using time-domain TVLA until 6M traces using fixed vs random |t|-test (Fig. 5). We observe that the power consumption is increased owing to the short-circuit current as both PMOS and NMOS common turn on time increases (Fig. 5), however this is <5%, the lowest reported amongst the countermeasures till date. Area overhead (11%) is one of the lowest compared to existing state-of-the-art countermeasures. Any capacitance beyond 870fF is not used to ensure high performance. A detailed comparison along with current overhead is presented in the table of Fig. 6. This design, for the first time, focuses on circuitlevel effects of the clock port and its interaction with inherent clockslew dependent variability of CMOS digital circuits that can be easily combined with existing and emerging power-port countermeasures for a multiplicative effect on SCA resilience. References:

[1] R. Kumar et al., ISSCC, 2022. [2] A. Singh et al., ISSCC, 2019. [3] Y. He et al, ISSCC, 2020. [4] M. Kar et al., ISSCC, 2017. [5] D. D. Hwang et al., JSSC, 2006. [6] C. Tokunaga, et al., ISSCC, 2009. [7] Q. Fang, et al., ISSCC, 2022. [8] W. Shan et al., JSSC, 2020. [9] D. Das, et al., ISSCC, 2020. [10] A. Ghosh, et al., ISSCC, 2021.



Authorized licensed use limited to: Purdue University. Downloaded on June 11,2023 at 18:36:35 UTC from IEEE Xplore. Restrictions apply.