# On the analysis and the mitigation of power supply noise and power distribution network impedance variation for scan-based delay testing techniques

C. Thibeault, Senior Member, IEEE and G. Gagnon, Member, IEEE

Abstract—In this paper, we analyze the impact of the power supply noise and the power distribution network (PDN) impedance variation on the timing margin in both modes for ICs with multiple clock domains. We investigate the so-called intermodulation products (IMPs). We show that IMPs are mainly induced by the dependent nature of the transistors. We also provide experimental results showing that scan-based delay testing can be optimistic with respect to the mission mode for maximum achievable nominal frequency prediction, even at lower clock frequencies. They also show that IMPs can induce timing margin fluctuations that can be larger than the ones induced by the voltage droop in test mode. Using an improved HSpice simulation model of a PDN validated by experimental results, we also quantify the timing margin variation due to power noise in test mode as a function of the clock frequency, including the so-called clock stretching phenomenon. Finally, we propose a robust test signal scheme for multiple clock domain chips. The simulation results reveal that this scheme is less sensitive to PDN impedance variation than the most popular existing test schemes, and that it provides timing margins closer to those obtained in mission mode.

Index Terms— Power supply noise, scan-based delay testing, intermodulation distortion, resonance, power issues

## I. INTRODUCTION

A t-speed testing has become a mandatory part of the test suite for ICs fabricated with technologies below 180nm [1]. Nowadays, structural scan-based delay testing techniques are the preferred options to perform at-speed testing. They have gradually replaced functional testing, which exercises the circuit under test (CUT) as in its actual environment. This gradual functional testing replacement has mainly been motivated by its expensive test pattern generation process [2].

Notwithstanding the popularity of structural scan-based

Manuscript submitted October 26, 2017. This work was supported in part by the Natural Science and Engineering Research Council of Canada and by Canadian Microelectronic Corporation delay testing techniques, their use still elicits some concerns, with the main one related to the power supply noise (PSN) induced by these techniques [3, 5, 6]. From a test perspective, PSN is a concern because of its potential impact on delay testing, which originates from the differences between the scanbased (delay) test and the mission (functional) modes, and which can result in a chip operating frequency gap of up to 30% [2].

Research has been devoted to quantifying the impact of PSN on delay testing, with most focusing on the power supply droop phenomenon. Over the past few years, certain observations have been made, and some have led to contradictory conclusions. Some papers (e.g., [1, 6, 7, 8]) have shown that, overall, the voltage droop has a negative impact on the maximum achievable clock frequency when applying structural scan-based delay testing techniques, even as the same droop has been shown to possibly induce clock stretching [7]. Pant et al. [9, 10] shed new light on the topic by showing that in some circumstances, structural scan-based delay testing techniques could lead to higher achievable clock frequencies than the mission mode. More specifically, they showed that this could happen at higher frequencies and at higher voltages. Their results confirmed the inherent difference between structural scan-based delay test and mission modes in terms of switching activity, due to the power droop, as well as the fact that functional tests do not necessarily represent a good reference.

Although [9, 10] provided a better understanding of the power supply droop during at-speed testing, further efforts are required to better characterize existing at-speed testing schemes with respect to frequency, power supply impedance and the presence of multiple clock domains. As our results indicate, some rules of thumb need to be revisited.

In this paper, we first present a multiple step characterization study. We start by going back to the basics to show that current pulses can be modeled as Dirac functions. This first step is important as it allows a better interpretation of some counterintuitive behavior in testing and mission modes. As a second step, we carry out a detailed analysis of the intermodulation product (IMP) phenomenon, which appears in

© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

The authors are with the Electrical Engineering Department, École de technologie supérieure, Montreal, Quebec, Canada (e-mail: claude.thibeault@etsmtl.ca, ghyslain.gagnon@etsmtl.ca).

the presence of multiple clock domains. The impact of IMPs during at-speed testing was first revealed in [11], by observing these IMPs directly on FPGA VDD pins and by providing HSpice simulation results based on a generic PDN simulation model, showing VDD and timing margin fluctuations caused by IMPs (these results and this model are discussed later). Here, we go one step further by identifying the IMP source, namely, the dependent nature of the current injection sources. We also experimentally show that IMPs can induce timing margin fluctuations that can be larger than the ones induced in test mode. As a third step, we propose an improved PDN simulation model, validated with experimental results, with which we quantify the timing margin variation due to power noise in test mode as a function of the clock frequency. Finally, we propose a new test signal scheme for multiple clock domain chips, which is more robust to PDN impedance variation than existing signal test schemes. As such, this work is a follow-up of [12], where a robust test signal scheme for a single clock IC was proposed. In [12], this signal scheme, called OCAS (One Clock Alternated Shift, shown in Section VIII), was compared in terms of robustness with respect to PDN impedance variation to other existing single clock schemes in the context of 3D ICs, where the impedance variation was induced by varying the number of dies in the 3D stack. HSpice simulations were performed for this comparison. The PDN HSpice models used in [12] were different from the ones proposed in [11] and here. In summary, the identification of the IMP source, the experimental demonstration of the IMP impact on the timing margins, the improved PDN simulation model and the proposed test signal scheme constitute the main contributions of this paper.

## II. RELATED WORK

In order to mitigate the power droop impact, some previous researchers proposed different timing schemes for the test signals [4, 8, 9, 13].

Arabi et al. [4] proposed a multiple-launch scheme applied along with a hierarchical scan strategy dividing the chip into DFT regions. Using this strategy, each DFT region is tested sequentially, while the others are kept in mission mode. In most cases, at least three LOC-type launch pulses are required to obtain switching activity closer to the mission mode. The authors did not take into consideration the potential interaction between the different clock domains. As shown in this paper, this interaction, which can take the form of intermodulation products, may significantly affect timing margins.

The SeBoS technique was originally proposed in [8] to avoid illegal states. It consists in inserting n slow clock cycles in mission mode, between the shifting in and the launch & capture phases. In [9], the SeBoS scheme was slightly modified by inserting a quiet phase just before the n slow clock cycles, in order to reduce the noise on the power network. Note that the SeBoS technique (original and modified version) can only be applied on a single clock domain.

The BurstMode technique [13] uses bursts of 5 high-speed (mission mode) clock pulses after the shifting in/out phase and a pause. The first four clock pulses are for shift (launch) cycles,

while the last one corresponds to the capture cycle. The LOS scheme is applied. Functional clocks are used to preserve the exact skew relationship between synchronous clock domains and to facilitate simultaneous cross-domain testing. Different burst schemes are possible, and therefore, a calibration phase is mandatory to select the right burst. Five of these bursts were explicitly defined in [13]. BurstMode can be used with single and multiple clocks. As shown in this paper, BurstMode may suffer from PDN impedance variation.

In this work, we propose a new test scheme which allows reducing the impact of PSN and PDN impedance variation, such that results obtained with scan-based delay testing are more representative of those from mission mode. To the best of our knowledge, this is the first time that the PDN impedance variation has been taken into account in a multiple clock domain context.

#### **III.** CURRENT INJECTION

In this section, we investigate the behavior of current pulses drawn by switching gates, using a simple RLC power distribution network model. We show that current pulses can be accurately modeled as Dirac functions.

To get a better understanding of the power noise caused by switching gates and validate a few assumptions, we analyze the current pulses drawn by these gates using HSpice simulations. Fig. 1 shows the circuit model (#1) used for our simulations. It first consists of a DC source and a simple RLC resonance circuit adjusted at 100MHz (with the inverter lines described below) and of a stimulus source feeding 21 parallel lines of buffers (of increasing size), modeling a clock tree. These buffers will inject current pulses in the PDN resonance circuit. Two types of line models were used:

• Model B1, with relative buffer sizes of 2, 24 and 288, and a 20pF capacitor;

• Model B2, with relative buffer sizes of 2, 6 and 74, and a 5pF capacitor, which represents about one quarter of model B1.

The buffer size refers to the relative buffer strength of a commercial 90nm CMOS technology standard cell library. In circuit #1, 10 B1 and 11 B2 models were used, which corresponds to the total number of lines of the validated model in Section VI.

Fig. 2 shows two curves of the VDD response over time: the darker (blue) one was obtained with HSpice simulations of a single 50-ps clock rising edge applied on node n0 (see Fig. 1), and the lighter (orange) one was derived from theoretical equations assuming that the current pulse is modeled as a Dirac function (see Appendix). This assumption is supported by observations of the current behavior (also from simulations), revealing the very short duration of the current peak. Except for the first 1.4ns, which basically corresponds to the switching period, there is a very good fit between the 2 curves.

Another way to verify how close the current pulse is to a Dirac function is by looking at the current frequency spectrum in the presence of a periodic clock source as a stimulus on node n0 (Fig. 1), leading to periodic current pulses. Periodic Dirac functions form a Dirac comb. The frequency spectrum of a Dirac comb is also a comb of impulses of amplitude equal to 1,

with an impulse present at the harmonics of the current pulse frequency [14].



Fig. 1. Simulation model, circuit #1.



Fig. 2. Simulation and theoretical results, VDD = f(t).

Fig. 3 shows the frequency spectrum of the current drawn by switching clock buffers obtained through an HSpice simulation of circuit #2 (Fig. 4), which is similar to circuit #1 (Fig. 1), except that the RLC resonance is adjusted to be 160MHz (to ease the frequency analysis), and that a clock source of 20MHz is used as a stimulus.



Fig. 3. Simulation results, FFT of *i(t)*.

Frequency components of significant amplitude can be seen every 20MHz. The amplitude of these 20MHz components is rather constant (within 9dB). These results led us to conclude that in a first-order approximation, current pulses can be modeled as Dirac functions and that the total response of the PDN is the succession and accumulation of the natural responses triggered by the current pulses.



Fig. 4. Simulation model, circuit #2.

This helps intuitively understand and better analyze the transient phase caused by any change affecting a train of pulses, for example, when a clock signal is (re)started or stopped. The consequences of this behavior on the test mode are that 1) the launch and capture operations occur while successive natural responses to current pulses appear and superpose (first-order approximation) in the PDN. Therefore, the emulated nominal frequency (defining the time between the 2 pulses) and the phase variations induced by the clock tree distribution and the combinational logic will determine the overall impact on the actual timing margin observed, and 2) as the natural PDN response is a function of the PDN impedance, any test scheme trying to mimic the mission mode, by changing the shift clock frequency or by letting parts of the device under test shift during the launch and capture operation on other parts, might be affected by PDN impedance variation

The consequences of the same behavior on the mission mode is that the use of power saving strategies, such as clock gating, can lead to strong transient behaviors that must be taken into account during testing. This will be added to the impact of IMPs, described next.

## IV. IMPACT OF IMPS

# A. IMP source

Intermodulation distortion occurs in non-linear devices and produces unwanted additional signals called IMPs, resulting from the interaction of two (or more) signals. The following expression [15] gives the IMP frequencies when two signals are involved:

$$f_s = \pm M f_1 \pm N f_2 \tag{1}$$

where  $f_s$  is the spurious (IMP) response frequency, M and N are positive integers ( $\geq 1$ ),  $f_l$  is the frequency of signal (tone) 1, and  $f_2$  is the frequency of signal (tone) 2. The sum M + N represents the order of the IMP. As an example, Table I provides some IMP values with  $f_l = 160$ MHz and  $f_2 = 120$ MHz. Note that when one tone is the multiple of the other, IMPs are undistinguishable from the (linear) harmonics, as they appear at the same frequencies. As CMOS transistors are non-linear devices, they may produce IMPs.

TABLE I SOME IMP (FS) VALUES,  $F_1 = 160MHz$  and  $F_2 = 120MHz$ 

| M+N | Equation     | fs (MHz) |
|-----|--------------|----------|
| 2   | $f_1 + f_2$  | 280      |
| 2   | $f_1 - f_2$  | 40       |
| 3   | $2f_1 - f_2$ | 200      |
| 3   | $2f_2 - f_1$ | 80       |
| 3   | $2f_1 + f_2$ | 440      |
| 3   | $2f_2 + f_1$ | 400      |

In our case, we are interested in the noise induced by switching gates on the VDD node. In that context, each gate can be seen as a current source whose amplitude depends on VDD. Here we want to show that IMPs can be induced by this relationship between the amplitude of these current sources and VDD.



Fig. 5. Circuit example #3.

Let us define the behavior of the two current sources in Fig. 5, depending on VDD, with the following 2 equations:

$$I_1(t) = v(t) \sin(\omega_1 t),$$

$$I_2(t) = v(t) \sin(\omega_2 t).$$
(2)
(3)

It can be shown that v(t) can be expressed as

$$v(t) = 1.2/(2 + R(\sin(\omega_1 t) + \sin(\omega_2 t)).$$
(4)

It can also be shown that v(t) can be approximated as

$$v(t) \approx DC + H_{11} + H_{12} + H_{21} + H_{22} + P_{f2-f1} + P_{f2+f1}$$
(5)

where:

- DC = 0.6 = DC term,
- $H_{11} = -0.3R \sin(\omega_1 t) = f_1$  first harmonic,
- $H_{12} = -0.3Rsin(\omega_1 t) = f_2$  first harmonic,
- $H_{21} = -0.075R^2 \cos^2(2 \omega_1 t) = f_1$  second harmonic,
- $H_{22} = -0.075R^2 \cos^2(2 \omega_2 t) = f_2$  second harmonic,
- $P_{f2-f1} = +0.15R \cos(\omega_2 t \omega_1 t) = f_2 f_1$  IMP, and
- $P_{f^2+f^1} = -0.15R \cos(\omega_2 t + \omega_1 t) = f_2 + f_1 \text{ IMP}.$

To confirm that the v(t) signal contains the anticipated IMPs, we simulated the Fig. 5 circuit with HSpice, leading to the results shown in Figs. 6 and 7.

In Fig. 6, we can see that the simulation results (solid yellow line) match the estimated ones from the last equation (black dashed line) well. The absolute value of the estimation error is less than 0.012V. Fig. 7 presents the results obtained when applying a Fast Fourier Transform (FFT) on the v(t) signal. One can see that all the IMPs listed in Table I appear in that graph, along with the harmonics of  $f_1$  and  $f_2$ .



Fig. 6. Circuit #3 simulation (sim) and theoretical (est) results.



Fig. 7. FFT results for v(t), circuit #3 simulation.

These results clearly confirm that the presence of current sources that are dependent on VDD can induce IMPs.

#### V. EXPERIMENTAL RESULTS

In [11], we presented experimental results clearly showing that IMPs affected the power distribution network. Those results were taken by a digital scope directly probing one of the VDD core pins of a Xilinx Spartan3E-500 FPGA [16], part of a commercial board (Nexys2 [17]). Here, we go one step further by presenting new results from measurements taken from inside the same FPGA, which will: 1) show how the timing margin is affected by the IMPs in mission mode, 2) show how the timing margin varies in the test mode, as we vary the duration of the pause to let the circuit settle down between the end of the shift and the launch and capture phase, 3) allow a comparison of the mission and test modes, and 4) allow the validation of an improved HSpice simulation model of the power distribution network.

To carry out these measurements, we developed an experimental setup comprising two Nesys2 Boards, where the

Xilinx Spartan3E-500 FPGA of the first board is used to implement the tester and the Xilinx Spartan3E-500 FPGA of the other board is the circuit under test (CUT). Fig. 8 shows a simplified block diagram of the setup, with the main signals.

Both modules (tester, CUT) are fed with a clock signal (CLKIN= 50MHz), a reset signal (RST\_i) and a select mode signal (SEM\_mode\_i), which are locally generated on each board. In addition, the CUT module is fed with 2 other signals (LCE, RCE) generated on its board, as well as 3 other signals (clk\_test\_i, CE\_test\_i, cnt\_cycle\_i) coming from the tester.



Fig. 8. Simplified setup block diagram.

Fig. 9 shows a simplified block diagram of the CUT. The 5 main CUT blocks are: 1) Clock generation, which generates three clock signals used in mission mode from the main 50MHz clock; 2) Clock selection, which selects between the test clock signal (in test mode) and the three clock signals from the clock generation block (in mission mode); 3) Shift registers, which consist of 3 separate toggling shift registers of 1200 flip-flops (FFs) each; 4) Clock enable signal generator, which provides the CE signal controlling the pulse generator; and 5) the pulse generator, which generates the X\_o pulse signal, whose width will be used to estimate the timing margin of a delay line between 2 FFs.



Fig. 9. Simplified CUT block diagram.

Fig. 10 shows the pulse generator circuit, which is composed of a controllable toggle flip-flop launching transitions, a controllable delay line (a single AND gate) and a transition detector (a flip-flop and an XOR gate) sending pulses on the X\_o node. The two FFs of this circuit are connected to CLK\_RES3, which is always configured to be the fastest of the 3 CLK\_RES (Fig. 10). The line\_CE\_i signal allows the selection of CLK\_RES3 period on which the measurement is performed. For each measurement, a transition is launched, creating a pulse whose width, X, is measured outside the FPGA with a digital scope. The width of the X\_o signal is practically equal to the time,  $T_{2E}$ , between two edges, one on the A signal, the other on the B signal. It can be shown that  $T_{2E}$  can be expressed as the sum of the timing margin on the A node and some offset. We estimated that the error on X with respect to true timing margin was around 10% [18]. This error is in fact an offset that is practically eliminated when the difference between two X measurements is used as a comparison metric.



Fig. 10. Pulse generator circuit.

#### A. Results for the mission mode

Fig. 11 shows results obtained in mission mode, with 4 clock domains: the main clock (F0 = 50MHz, without toggling register) used to generate three functional clocks (CLK\_RES1 to CLK\_RES3) at F1= 30, F2= 40 and F3 = 160MHz (with toggling registers), respectively. These results represent (X\_o signal) pulse width X values taken over 32 consecutive 160MHz clock periods. Each X value appearing in this graph is an average value measured with our digital scope over a few seconds. Using 32 consecutive measurements allows us to cover a total sampling time of 200ns, which corresponds to two 10MHz clock periods, with 10MHz being the lowest expected IMP.



Fig. 11. Measured pulse width, X, in mission mode, main clock = 50MHz, functional clocks = 30, 40 and 160MHz.

It can be seen that the pattern of the first 16 samples is repeated over the last 16. The presence of the 10MHz frequency is also detectable by performing an FFT on these 32 samples. We can also observe how the pulse width (therefore, the timing margin) varies over the 32 samples. These results clearly reveal the importance of wisely selecting the right moment to estimate the timing margin to ensure that the worst case scenario in the presence of multiple clock domains is obtained. Let us define  $\Delta X max$  as X max - X min, where X max and X min respectively are the maximum and the minimum pulse width measured values. In Fig. 11, the resulting  $\Delta Xmax$  value is 380ps, which represents 6% of the 160MHz clock period. This means that estimating the timing margin at the worst moment (here, sample #9) would lead to an overestimation of 380ps. This is the highest value we have observed so far. Table II lists some measured  $\Delta Xmax$ values, in decreasing order, for this setup. Quite interestingly, the  $\Delta X max$  value obtained with the 30/40/160 clock combination is higher than that obtained with F1=F2=F3=160MHz, namely, 228ps. This result might be counterintuitive as our measurements suggest that the power distribution network resonates in the vicinity of 160MHz. As such, one may expect VDD fluctuations to be greater with the 160/160/160 clock combination than with the 30/40/160 one. While it may be clear that VDD fluctuations are necessary to induce timing margin variations, our results suggest that their amplitude might be misleading with respect to the timing margin significance, and that the phase also has a role to play. This particular point will be further analyzed using simulations in Section VI.

TABLE II Some  $\Delta$ Xmax measured values as a function of functional clock frequencies; main clock = 50MHz

| F1 (MHz) | F2 (MHz) | F3 (MHz) | ⊿Xmax (ns) |
|----------|----------|----------|------------|
| 30       | 40       | 160      | 0.380      |
| 40       | 60       | 160      | 0.358      |
| 30       | 40       | 80       | 0.298      |
| 70       | 80       | 160      | 0.278      |
| 160      | 160      | 160      | 0.228      |
| 130      | 140      | 160      | 0.214      |
| 100      | 130      | 160      | 0.212      |
| 120      | 140      | 160      | 0.207      |
| 140      | 150      | 160      | 0.178      |
| 150      | 150      | 150      | 0.054      |
| 140      | 140      | 140      | 0.048      |
| 170      | 170      | 170      | 0.038      |

Back to Table II, one can see that  $\Delta Xmax$  values are much lower (below 100ps) when using 140, 150 or 170 instead of 160MHz. On the other hand, replacing (F3=) 160 by 80MHz still leads to a significant margin fluctuation ( $\Delta Xmax = 298$ ps, F1=30MHz, F2=40MHz), which can be partly explained by the frequency behavior presented in Section III (Fig. 3).



Fig. 12. Measured pulse width, X, in test mode, test (launch & capture) clock = 80MHz, shift clock = 20 (S20), 40 (S40) and 80MHz (S80).

#### B. Results for the test mode

Fig. 12 shows results obtained in test mode for the pulse width X as a function of the moment the launch & capture (L&C) operations are performed. This experiment emulated a scanbased at-speed test for a functional clock of 80MHz. The moment the L&C are executed is expressed as a number of clock periods after the end of the shift phase, where a number of clock periods equal to 1 means that the L&C appear right after the last shift.

There are 3 curves in Fig. 12, for 3 different shift clock frequencies: 20, 40 and 80MHz. In that particular example, the shift clock frequency does not have a significant impact on the pulse width values, except for the case when the L&C appear right after the last shift. It can be seen that the 3 curves rapidly settle down after being disturbed by the last shift clock pulses. This was the expected behavior, as it was described in [4].

## C. Comparison of mission and test modes

As mentioned earlier, PSN is a concern for structural scanbased delay testing. As such, it is important to compare results from mission and test modes in order to ensure that the latter is representative of the former.



Fig. 13. Measured pulse width, X, in both mission (Func) and test (Test) modes, test (launch & capture) clock = 80MHz, shift clock = 20MHz; mission mode: main clock = 50MHz, functional clocks = 30, 40 and 80MHz

Fig. 13 provides results that allow a comparison of the test and mission modes. Again here, the results are expressed as pulse width X values. For the test mode, we reproduced the Fig. 12 curve with a shift clock frequency of 20MHz, while for the mission mode, we redid the experiment described in Section V-

A (see Fig. 11), but with an 80MHz functional clock instead of the 160MHz one, to allow a fair comparison. The AXmax value for the mission mode curve is 298ps. The results of this experiment highlight an interesting case where the timing margin of the test mode is larger than that of the mission mode by about 450ps when we compare the minimum margin value of the mission mode with that of the test mode, assuming that the L&C would be set to appear 10 or more clock periods after the last shift clock pulses. This is a clear indication that at-speed scan-based testing can also be optimistic at lower speeds, unlike what was suggested in [9, 10], namely, that this behavior only occurs at higher frequencies. In Section VII-A, it will be shown with simulations that this particular clock frequency of 80MHz corresponds to a special case where combining the response of the launch and capture pulses results in a combinational delay between FFs, which is reduced while the time between consecutive clock rising edges is stretched.

## VI. AN IMPROVED AND VALIDATED HSPICE SIMULATION MODEL

In this section, we present a new HSpice simulation model, which was derived from the model we proposed in [11]. The objective of this new simulation model development is to better fit measurements and to overcome some limitations of our experimental setup and allow a deeper and more realistic exploration of the overall behavior with respect to frequency. Both models (previous, from [11] and new one) are composed of two parts: the PDN and the active circuits. These two parts were improved, and the improvements are described in the next two sub-subsections.

## A. PDN model

In [11], we proposed an HSpice power distribution network model of an FPGA (Xilinx Spartan3E) on a commercial (Nesys2) board, using the decoupling capacitor models found in [19, 20]. The power distribution system was modeled as a combination of 4 resonance circuits. The first corresponds to the one found on the Nexys2 board, while the three others are modeled as simple RLC circuits [21], with resonance frequencies (5MHz, 150MHz, 50GHz) based on the range values found in [22].

The improved PDN model appears in Fig. 14.



Fig. 14. Improved PDN model

Our proposed PDN model now includes the effect of decoupling capacitors (dcap\_vdd) on the VDD node. Also, two regular capacitors were replaced by decoupling capacitors (dcap\_10uF, dcap\_100uF). Finally, the value of a few elements (4 resistors and 1 capacitor) was adjusted.

# B. Active circuit model

The active circuit model proposed here was developed to provide results in the time and frequency domains close to the measurements taken from the Nesys2 board. In contrast, the active circuit model in [11] was a purely theoretical one, and was not meant to match these measurements. As in [11], the active circuits of the proposed model are connected to the VDD node. As mentioned in Section III, the resulting active circuit model is composed of a total of 21 buffer lines (12 B1 and 9 B2, see Figs. 1 and 4). As there were 4 different clock domains in our experiments on the Nesys2 board, we had to vary the number of buffer lines in each domain to find the combination offering the best fit between simulation results and measurements. The best combination we found uses 10.5 buffer lines (10 B1 and 2 B2) for the main clock domain (50MHz) and 0.75 lines (3 B2) each for the 3 others.

## C. Complete model validation

To validate the complete model (PDN + active circuits), a delay line model [11] was used (Fig. 15). All the simulation models were based on a 90nm CMOS technology, which is the same technology node as the Spartan3E.



Fig. 15. Delay line model

The validation process aimed to minimize the differences between the measurements and the simulations results. The parameter used for that purpose was  $\Delta Mmax$ . Using the delay line (Fig. 15), the timing margin *M* is estimated during the simulations such that node n3 represents the clock input of a first FF and n102 the data input of the next FF. This estimation assumes that the time taken by the FF output signal to be valid after the clock edge, as well as the combinational path to reach the second FF, are modeled by the delay line. It is also assumed that the clock edge arrives to both the first and second FFs at the same time (meaning there is no skew). Accordingly, we also define Y as the delay of a transition traveling from the clock input of a first FF (n3) to the data input of the next FF (n102), and T = Y+M as the effective clock period on n3.

The difference between the simulation and the experimental results is expressed as the ratio between the two, namely  $\Delta XMR = \Delta Xmax/\Delta Mmax$ , where  $\Delta Mmax = Mmax-Mmin$  and  $\Delta Xmax = Xmax-Xmin$  are the margin difference values for simulations and measurements, respectively.

Six different sets were considered, where a set corresponds to a given selection of frequency values. For each set, there were 4 clock domains (CDs), one for the main clock domain (F0, always at 50MHz) and 3 others for the so-called resonance clock domains, where FFs were used to induce power noise. Therefore, each set was defined by the clock frequencies applied to the 3 resonance CDs. The following are the 6 sets of values (F1, F2, F3):

- S1: 140, 150, and 160MHz
- S2: 120, 140, and 160MHz
- S3: 130, 140, and 160MHz
- S4: 130, 145, and 160MHz
- S5: 100, 130, and 160MHz
- S6: 110, 135, and 160MHz

TABLE III  $\Delta XMR$  ratio for the 6 sets considered.

| <b>S1</b> | S2   | <b>S3</b> | <b>S4</b> | <b>S5</b> | <b>S6</b> |
|-----------|------|-----------|-----------|-----------|-----------|
| 0.85      | 1.15 | 1.04      | 1.03      | 1.05      | 1.05      |

Each of these six sets of frequencies presents a different combination of frequencies, mainly composed of the four clock frequencies and their resulting IMPs. All those frequencies belong to the most relevant part of the spectrum for this validation exercise, namely those that can directly or indirectly trigger the resonant circuit around 160MHz, which is the most important of the four resonant circuits of the model, considering the range of target clock frequencies.

Table III shows the results obtained for the fastest clock domain (160MHz). These results show a good fit between simulation and measurements, especially for the last 4 sets (S3 to S6). They were obtained using 10.5 buffer lines (10 B1 and 2 B2 models) for the main clock domain (50MHz) and 0.75 lines (3 B2 models) each for the 3 others.

Another parameter taken into account during the validation process is the spectral contain. This spectral contain was estimated by performing an FFT of the 32 (simulated/measured) margin values for each set. Fig. 16 shows the results obtained with S1.



8

Fig. 16. FFT results of the margin from measurements (meas) and simulation (sim), set S1 (140, 150 and 160MHz),.

It reveals a rather good fit between measurements and simulations, where the 10MHz component is overestimated by simulations, and where the 50MHz component is underestimated. Similar results were obtained with the 5 other sets.

Based on these results and on the fact that they were obtained without any information on the Spartan 3E design and on the 90nm CMOS process used, we consider that our simulation model is validated.

# VII. SIMULATION RESULTS OBTAINED WITH THE PROPOSED MODEL

#### A. Timing margin with respect to frequency in test mode

The first use of our validated complete HSpice simulation model (Figs. 14 and 4), along with our delay line model (Fig. 15), is to explore the behavior of the timing margin when we vary the target functional clock frequency in test mode. More specifically, we want to show that: 1) contrary to what was suggested by [9], low frequency at-speed testing can lead to optimistic results; 2) higher switching activities during testing do not necessarily lead to yield loss, and 3) in some cases, faster-than-at-speed testing does not necessarily lead to degraded test results. To avoid any disturbance due to the scan shift operation, simulations were performed without scan shifting, to emulate the situation where enough time is allowed to settle down.



Fig. 17. Absolute difference of timing margin (ADM), delay (ADY) and clock period (ADT), with respect to a perfect power supply distribution network, as a function of the L&C clock frequency (log scale); VDD = 0.9V, test mode without scan shifting

Fig. 17 shows a first set of results, expressed as ADM, ADY, and ADT, where ADM = Mmod - Mref is the absolute difference between Mmod (simulated margin value M obtained with the validated model) and Mref (simulated M value obtained with a perfect VDD), where ADY = Ymod - Yref is the absolute difference between Ymod (simulated value Y obtained with the validated model) and Yref (simulated Y value obtained with a perfect VDD), and where ADT = Tmod - Tref is the absolute difference between Tmod (simulated value T obtained with a perfect VDD), and Tref (simulated value T obtained with the validated model) and Tref (simulated T value obtained with a perfect VDD).

These results were obtained with a nominal VDD value of 0.9V. Each subset of 3 points at a given clock frequency was obtained with a simulation using a given number (from 30 to 1320) of inverters in the delay line (Fig. 15), where the delay in the inverter chain represents 95% of the L&C clock period (with a perfect VDD).

Looking at the ADM curve reveals that part of it (between 55 and 87MHz) is above 0, meaning that the L&C operation can in some cases improve the timing margin, even at lower frequencies. In this particular case, the timing margin improvement comes from the reduction of the delay, with a small contribution from clock stretching. This observation is in line with the results reported in Fig. 13, where the timing margin of the test mode is greater than that of the mission mode (at 80MHz). This part of the ADM curve provides an interesting counterexample to the often implicit rule of thumb according to which reducing the VDD droop amplitude by lowering switching activities can contribute to lower delays and a lower associated yield loss. Let us consider the ADM result at 70.2MHz. At that frequency, an ADM value of 157ps was obtained, as well as a VDDmin value of 0.835V. Additional simulations revealed that at the same frequency, halving the switching activity level leads to a higher VDDmin value (0.866V), but also to about half the ADM value (72ps). It also reveals that doubling the switching activity level at 70.2MHz lowers the VDDmin value to 0.779V, but about doubles the ADM value. Although the ADM curve is most often below zero (meaning that the L&C operation reduces the timing margin), these results clearly show that this is not always the case.

Another very interesting observation that can be made here is that there is a minimum value reached in the *ADM* curve around 350MHz. In this particular case, it means that above 350MHz, using a shorter L&C period leads to an absolute increase of the timing margin. For example, using a 616MHz clock (and a shorter delay line) instead of a 518MHz one leads to an M increase of 16ps. A direct consequence of this observation is that applying faster-than-at-speed test strategies does not necessarily worsen the power supply noise impact on timing margins, and consequently, does not necessarily lead to more yield loss than normal scan-based at-speed testing. Looking at the two other curves we see that a timing margin improvement over 400MHz is due to a combination of delay line reduction (Y↓) and clock period stretching (T↑).

# B. A counterexample in mission mode

Our objective here is to provide a counterexample in mission mode, showing that the correlation between Vddmin and the timing margin can be altered by the presence of multiple clock signals at different frequencies. In other words, for about the same VDDmin value, the presence of these multiple clock frequencies can induce a much larger variation in the timing margins. To that end, we performed another set of simulations using the same validated models, again with a nominal VDD at 0.9V. This time, the mission mode was simulated, in two particular scenarios):

• SC (single clock): all 4 clocks (F0 to F3) at 160MHz, and

• MC (multiple clocks): the main clock (F0) = 50MHz, the 3 others at 160MHz.

Results are listed in Table IV. All simulations lasted 80 periods of 160MHz, and the measurements were taken over the last 32 periods in order to avoid most of the transient phase. With a single clock (SC), the timing margin remains about stable around 280ps ( $\pm$ 14ps, due to some remaining transient effects), while it varies much more with multiple clocks (MC), where the  $\Delta$ Mmax = Mmax -Mmin (539ps) represents more than 8% of the clock period. We obtained about the same minimum (and maximum) VDD values for the 2 scenarios, while the minimum timing margin value was severely reduced with multiple clocks. Clearly, in this case, the VDDmin value alone is not a good indicator of the resulting timing margin, if used with and without other clock signals.

 TABLE IV

 TIMING MARGIN VERSUS VDD FLUCTUATIONS, MISSION MODE. SC = SINGLE

 CLOCK
 MC = MULTINE CLOCK

|                  | SC    | MC    |
|------------------|-------|-------|
| VDD max (V)      | 0.933 | 0.933 |
| VDD min (V)      | 0.850 | 0.847 |
| $\Delta VDD (V)$ | 0.083 | 0.086 |
| Mmax (ns)        | 0.293 | 0.566 |
| Mmin (ns)        | 0.266 | 0.027 |
| ΔMmax (ns)       | 0.027 | 0.539 |

Note that the  $\Delta$ Mmax value obtained through simulations for MC (539 ps) differs from the measured  $\Delta$ Xmax value listed in Table II (228ps). This difference is mainly due to the fact that the MC value was obtained at a different VDD value (0.9 instead of 1.2) and that the delay line was longer. Using 1.2V and a delay line more closely fitting the one in the FPGA leads to a simulated  $\Delta$ Mmax value of 193ps, which is much closer to the measured one.

## VIII. PROPOSED TEST SIGNAL SCHEME WITH TWO CLOCK DOMAINS

The previous results (from equations, simulations and measurements) allow us to identify the challenges related to atspeed scan-based testing in order get results aligned with those from the mission mode. The main challenges are two-fold:

• The mission mode itself can be challenging, as it may be affected by IMP and by the transient phase induced by

power saving mechanisms such as clock gating; it may also influence the test mode if some parts of the device under test (DUT) are running in mission mode during testing.

• The PDN impedance should be taken into consideration when applying test schemes mimicking the mission mode or attempting to reduce the voltage droop during launch and capture.

In this section, we propose a new signal scheme while applying scan-based delay testing and we compare it to existing schemes. Our new signal scheme, called Dual Clock Alternated Shift – Launch On Capture (DCAS-LOC), models the mission mode, to get similar power distribution conditions, while applying scan-based tests. As its name suggests, DCAS-LOC is a launch-on-capture scheme. Fig. 18 shows the proposed signal scheme From top to bottom, we can find the following signals:

- F<sub>clki</sub>: functional clock, domain *i*,
- S<sub>clkij</sub>: scan clock, domain *i*, subdomain *j*,
- se<sub>clkij</sub>: scan enable, domain *i*, subdomain *j*,
- S/R'<sub>clk1j</sub>: scan/rotate', domain 1, subdomain *j*,
- S/R<sub>clk2</sub>: scan/rotate', domain 2 (for both subdomains)

where  $i = \{1, 2\}$ ,  $j = \{a, b\}$ , and where domain 1 is the fastest clock domain.



Fig. 18. DCAS-LOC signal scheme

The strategy to mimic mission mode involves dividing each clock domain into subdomains (here, subdomains a and b) and using the same pulse clock as in mission mode for the scan clock, but at a reduced pace in each subdomain (here, using basically one pulse over two) during the shift (in and out) phase.

Four different letters (R=Rotate, L=Launch, C=Capture, S=Shift) appear inside each scan clock pulse, indicating the operations executed. The DCAS-LOC signal scheme uses two simultaneous L&C phases for the two subdomains of each clock domain. Note that in order to get non-critical scan enable signals, some functional clock pulses are either not perfectly reproduced by the scan clocks (highlighted in yellow) or absent (highlighted in orange). More specifically, the  $F_{clk1}$  pulse highlighted in yellow is not perfectly reproduced to give two clock periods to disable the se<sub>clk1a</sub> signal, while two consecutive rotate pulses of the S<sub>clk1</sub> signal give three clock periods to disable the se<sub>clk1a</sub> pulse highlighted in orange is absent to give two clock periods to enable both se<sub>clk1a</sub>.

and  $se_{clk1b}$  signals. The same strategy is applied for the other clock subdomain. Note finally that in each clock domain, there is one subdomain (clk1b, clk2a) where an additional launch pulse (with the scan enable signal disabled) is added prior to the L&C operation.

Most of the times, the best results were obtained by applying the launch pulse (prior to the capture one) at the period corresponding to the lowest mission mode period margin. Using such an alignment requires a calibration phase, as for BurstMode.

The Rotate operation is there in case it is not possible to feed the scan chains fast enough to allow them to shift at half the mission mode frequency. In that case, the scan chains are first shifted at a lower pace, and then run at the target frequency. Note that a similar rotate strategy is used in [13]. On the top of each scan clock pulse, one can find the SA (Switching Activity, H: High, L: Low) indicator. This refers to the size of the clock distribution network switching. *High* corresponds to the shifting operation, while *Low* corresponds to the mission mode.

We assume that the whole chip contains two similar clock distribution networks (CDNs). When combined, these two CDNs are equivalent to the validated model of Section VI (i.e., 12.75 buffer lines = N, each CDN corresponding to 0.5N). We also assume that each CDN is divided into two similar clock subdomains (0.25N each). Considering that 30 to 50% of the total dynamic power is consumed by the CDN [23] and that the flip-flop (data) switching activity is typically three to four times higher in shift mode than in mission mode [4], we assume that the total switching activity during the shift (or rotate) operation for the whole chip corresponds to a CDN of a size 2.5N [11] (as FFs are replaced by load capacitances in our simulation models). In a first approximation, this selected 2.5N size corresponds to different realistic cases in terms of the proportion of the dynamic power due to the clock distribution network, P<sub>DPCND</sub>, and the flip-flop switching activity ratio of the scan shifting mode over the mission mode, R<sub>FFSA</sub>, such as [12]: 1)  $P_{DPCND} = 50\%$  and  $R_{FFSA} = 4$ , 2)  $P_{DPCND} = 40\%$  and  $R_{FFSA} =$ 3.5, and 3)  $P_{DPCND} = 33\%$  and  $R_{FFSA} = 3.25$ .

As mentioned before, DCAS-LOC is an extension of the OCAS scheme proposed in [12]. The OCAS scheme is shown in Fig. 19. OCAS also uses two separated clock subdomains and alternates the shifting of the two subdomains. However, the launch and capture sequence is different, as one clock subdomain is in a LOS mode while the other is in LOC mode.

Note finally that the DCAS-LOS scheme can be extended to more than 2 clock domains, as only one clock domain is tested at the time (using the sequence composed of a single launch scan clock pulse, followed by double-launch and doublecapture pulses) while the others remain in shift/rotate phase.



Fig. 19. OCAS signal scheme [12]

#### IX. COMPARISON WITH EXISTING SCHEMES

Table V shows the simulation results. In our comparison, we use the mission mode timing margin  $(M_{mm})$  as a reference to compute the following metric,  $M_{RTmm}$ , which corresponds to the timing margin difference in % with respect to  $M_{mm}$ , and the mission node clock period  $(T_{mm})$ :

$$M_{R/Tmm} = 100(M_t - M_{mm})/T_{mm}$$
(6)

where  $M_t$  is the timing margin of the target technique. Comparisons are based on HSpice simulation results using the validated power distribution model (Fig. 14) and the delay line model (Fig. 15).

 $TABLE \ V \\ M_{R'TMM} \ (\%) \ \text{for the fastest clock domain as a function of the clock frequencies for the different techniques considered}$ 

|        |         | fast clock frequency (MHz) |         |         |                        |       |       |                        |       |  |  |
|--------|---------|----------------------------|---------|---------|------------------------|-------|-------|------------------------|-------|--|--|
|        |         | 600                        |         |         | 800                    |       |       | 1000                   |       |  |  |
| option | slow cl | ock freq                   | . (MHz) | slow cl | slow clock freq. (MHz) |       |       | slow clock freq. (MHz) |       |  |  |
|        | 60      | 100                        | 140     | 60      | 100                    | 140   | 60    | 100                    | 140   |  |  |
|        | •       |                            |         | Conver  | ntional                |       | •     | •                      |       |  |  |
| LOC    | 1.9     | -1.5                       | 4.6     | 4.0     | -0.9                   | 5.9   | 4.9   | 1.0                    | 7.0   |  |  |
| LOS    | -4.2    | -7.6                       | -1.5    | -1.6    | -6.5                   | 0.3   | 0.4   | -3.5                   | 2.4   |  |  |
|        |         |                            |         | Burst   | Vode                   |       |       |                        |       |  |  |
| b1     | -14.2   | -7.0                       | -13.6   | -16.7   | -10.3                  | -17.5 | -20.9 | -13.4                  | -21.8 |  |  |
| b2     | -5.1    | 0.6                        | -4.2    | -11.7   | -5.3                   | -13.4 | -14.1 | -5.5                   | -16.0 |  |  |
| b3     | -25.6   | -18.6                      | -26.0   | -11.8   | -5.6                   | -13.6 | -4.3  | 1.9                    | -8.1  |  |  |
| b4     | -21.0   | -15.6                      | -21.1   | -19.4   | -12.5                  | -21.4 | -17.4 | -10.1                  | -19.4 |  |  |
| b5     | -7.9    | -1.9                       | -6.8    | -20.0   | -14.1                  | -21.3 | -26.5 | -18.1                  | -28.3 |  |  |
| best   | -5.1    | 0.6                        | -4.2    | -11.7   | -5.3                   | -13.4 | -4.3  | 1.9                    | -8.1  |  |  |
| worst  | -25.6   | -18.6                      | -26.0   | -20.0   | -14.1                  | -21.4 | -26.5 | -18.1                  | -28.3 |  |  |
|        |         |                            |         | DCAS    | -LOC                   |       |       |                        |       |  |  |
| LC     | -0.2    | 0.7                        | -0.7    | 1.1     | 2.1                    | 0.7   | 0.4   | 0.9                    | 0.2   |  |  |
|        |         |                            |         |         |                        |       |       |                        |       |  |  |

A first striking observation that can be made is that conventional techniques (LOC, LOS) generally perform well with 2 clock domains. Nevertheless,  $M_{R/Tmm}$  values of 7.0 and -

-7.6% are reported for LOC and LOS, respectively, in the multiple clock context in Table V.

For BurstMode, the best burst is highlighted in green (Table V). We can see that b2 is the best burst for the first 6 cases (fast clock frequencies = 600 and 800MHz), while b3 is the best one for the last 3 cases (fast clock frequency = 1000MHz). We also can see that in some cases, even the best burst gives pessimistic results, with  $M_{R/Tnun}$  values of -11.7% and -13.4%, for the 800/60 and 800/140 frequency combinations, respectively. These results were obtained while performing the BurstMode launch operations when the rising edges of both clock domains were aligned (clock alignment 2 [13]). Note that there might be some bursts not explicitly defined in [13], for which better results could be obtained.

For the DCAS-LOC scheme, the worst  $M_{R/Tmm}$  value obtained over the 9 cases is +2.1%, which outperforms LOC, LOS and BurstMode, in terms of worst  $M_{R/Tmm}$  absolute value.

#### X. IMPACT OF PDN IMPEDANCE VARIATION

In this work, we were also interested in the impact of PDN impedance variation on the results provided by the different techniques considered. To assess this impact, we performed other sets of simulations, where we simultaneously varied the nominal values of the resonant RLC circuit #3 (Fig. 14) by +/-10%. Table VI presents these results. From top to bottom, one can distinguish 3 blocks of results:

- RCL#3, nominal values: these results come from Table V; for LOS, LOC and DCAS-LOC (DLOC), the results are directly copied; for Burst Mode (BM), the best results are reported;
- RLC#3, -10%: the R, L and C nominal values are each reduced by 10%;
- RLC#3, +10%; the R, L and C nominal values are each increased by 10%;

For each of the 9 clock frequency combinations in each of the 3 results blocks, the best results (namely, the lowest  $M_{R/Tmm}$  absolute values) among the four test schemes are highlighted in green. For the RLC#3 nominal values, DCAS-LOC gives the best results for 5 of the 9 considered frequency combinations, while LOS, LOC and BM give best results for 2, 1, and 1 frequency combinations, respectively. For the RLC#3 nominal values -10%, we obtained similar results (BM gives the best ones for 2 frequency combinations, LOS for 1). The situation is a bit different for the RLC#3 nominal values +10%, where DCAS-LOC now gives the best results for all but 2 considered frequency combinations, and LOS and LOC, for 1.

In Table VII, we listed the worst cases of the three previous results blocks of Table VI, now expressed as the  $M_{RTmm}$  (%) absolute value. For each frequency combination, the best result is still highlighted in green, while the worst result for each test technique is highlighted in yellow. According to these results, DCAS-LOC is the best solution for 7 of the 9 considered frequency combinations, and LOC and LOS, for 1. Moreover, DCAS-LOC has the lowest worst case value (2.3%) over the 9 considered frequency combinations, followed by LOC (7.0%), LOS (8.3%) and BM (14.3%).

 $TABLE \ VI \\ M_{R:TIMM} \ (\%) \ \text{as a function of the clock frequency for the considered techniques, with PDN impedance variation}$ 

|        | fast clock frequency (MHz) |                  |      |        |                           |          |      |                           |       |  |
|--------|----------------------------|------------------|------|--------|---------------------------|----------|------|---------------------------|-------|--|
|        | 600                        |                  |      | 800    |                           |          | 1000 |                           |       |  |
| option | slov                       | slow clock freq. |      |        | slow clock freq.<br>(MHz) |          |      | slow clock freq.<br>(MHz) |       |  |
|        | 60                         | 100              | 140  | 60     | 100                       | 140      | 60   | 100                       | 140   |  |
|        |                            |                  |      | RCL#3, | nomina                    | l values |      |                           |       |  |
| LOC    | 1.9                        | -1.5             | 4.6  | 4.0    | -0.9                      | 5.9      | 4.9  | 1.0                       | 7.0   |  |
| LOS    | -4.2                       | -7.6             | -1.5 | -1.6   | -6.5                      | 0.3      | 0.4  | -3.5                      | 2.4   |  |
| BM     | -5.1                       | 0.6              | -4.2 | -11.7  | -5.3                      | -13.4    | -4.3 | 1.9                       | -8.1  |  |
| DLOC   | -0.2                       | 0.7              | -0.7 | 1.1    | 2.1                       | 0.7      | 0.4  | 0.9                       | 0.2   |  |
|        |                            |                  |      | RC     | CL#3, -10                 | )%       |      |                           |       |  |
| LOC    | 2.2                        | -1.3             | 4.6  | 4.1    | -1.2                      | 5.5      | 5.2  | 1.0                       | 6.9   |  |
| LOS    | -4.8                       | -8.3             | -2.4 | -2.3   | -7.5                      | -0.9     | 0.0  | -4.1                      | 1.8   |  |
| BM     | -5.3                       | 0.0              | -4.8 | -11.7  | -6.2                      | -14.2    | -6.1 | -0.5                      | -11.9 |  |
| DLOC   | -0.7                       | 0.5              | -0.6 | 0.9    | 2.2                       | 0.8      | 0.6  | 1.0                       | 0.4   |  |
|        |                            |                  |      | RC     | CL#3, +10                 | 0%       |      |                           |       |  |
| LOC    | 1.9                        | -1.2             | 4.2  | 3.7    | -0.7                      | 5.1      | 4.7  | 1.0                       | 6.0   |  |
| LOS    | -4.0                       | -7.1             | -1.7 | -1.5   | -5.9                      | -0.1     | 0.7  | -2.9                      | 2.0   |  |
| BM     | -6.1                       | 1.3              | -3.9 | -12.4  | -4.0                      | -12.3    | -3.6 | 3.7                       | -4.9  |  |
| DLOC   | 0.1                        | 0.9              | -0.3 | 1.2    | 2.3                       | 0.6      | 0.2  | 0.9                       | 0.0   |  |

TABLE VII WORST CASE FOR THE THREE RLC#3 values, expressed as the  $M_{\rm R/Tmm}$  (%) absolute value

|        | fast clock frequency (MHz) |            |          |                               |         |      |                  |                      |      |  |  |
|--------|----------------------------|------------|----------|-------------------------------|---------|------|------------------|----------------------|------|--|--|
|        |                            |            |          |                               |         |      |                  |                      |      |  |  |
|        |                            | 600        |          |                               | 800     |      |                  | 1000                 |      |  |  |
| option | slow clock freq.           |            |          | slow clock freq.              |         |      | slow clock freq. |                      |      |  |  |
|        |                            | (IVIHZ)    |          |                               | (IVIHZ) |      |                  | (IVIHZ)              |      |  |  |
|        | 60                         | 100        | 140      | 60                            | 100     | 140  | 60               | 100                  | 140  |  |  |
|        | woi                        | rst case i | from the | m the 3 previous blocks of re |         |      |                  | ults, absolute value |      |  |  |
| LOC    | 2.2                        | 1.5        | 4.6      | 4.1                           | 1.2     | 5.9  | 5.2              | 1.0                  | 7.0  |  |  |
| LOS    | 4.8                        | 8.3        | 2.4      | 2.3                           | 7.5     | 0.9  | 0.7              | 4.1                  | 2.4  |  |  |
| BM     | 6.1                        | 1.3        | 4.8      | 12.4                          | 6.2     | 14.2 | 6.1              | 3.7                  | 11.9 |  |  |
| DLOC   | 0.7                        | 0.9        | 0.7      | 1.2                           | 2.3     | 0.8  | 0.6              | 1.0                  | 0.4  |  |  |

Finally, in Table VIII, we present the Max-Min of the  $M_{R/Tnum}$  values to show the sensitivity of each test technique with respect to PDN impedance variation under another angle. For each test technique and frequency combination, we computed the difference between the highest and the lowest  $M_{R/Tnum}$  values observed over the 3 PDN impedance scenarios (RLC#3 nominal, -10%, +10%). As an example, for BM at 1000/140MHz, the highest and lowest  $M_{R/Tnum}$  values were -4.9 and -11.9, respectively, leading to a Max-Min of 7.0%. This represents the highest variation we observed so far, followed by LOS (1.6%), LOC (1.0%) and DCAS-LOC (0.8%).

 $TABLE\ VIII\\ Max-Min\ of\ M_{R/TMM}\ value\ (\%)\ as\ a\ function\ of\ the\ clock\ frequency\\ for\ the\ considered\ techniques,\ with\ PDN\ impedance\ variation$ 

|        | fast clock frequency (MHz) |     |     |                           |           |     |                           |      |     |  |
|--------|----------------------------|-----|-----|---------------------------|-----------|-----|---------------------------|------|-----|--|
|        | 600                        |     |     |                           | 800       |     |                           | 1000 |     |  |
| option | slow clock freq.<br>(MHz)  |     |     | slow clock freq.<br>(MHz) |           |     | slow clock freq.<br>(MHz) |      |     |  |
|        | 60                         | 100 | 140 | 60                        | 100       | 140 | 60                        | 100  | 140 |  |
|        |                            |     |     | Ν                         | /lax - Mi | n   |                           |      |     |  |
| LOC    | 0.3                        | 0.3 | 0.4 | 0.4                       | 0.5       | 0.8 | 0.5                       | 0.0  | 1.0 |  |
| LOS    | 0.8                        | 1.2 | 0.9 | 0.8                       | 1.6       | 1.2 | 0.7                       | 1.2  | 0.6 |  |
| BM     | 1.0                        | 1.4 | 0.9 | 0.8                       | 2.2       | 2.0 | 2.4                       | 4.2  | 7.0 |  |
| DLOC   | 0.8                        | 0.4 | 0.3 | 0.3                       | 0.2       | 0.2 | 0.4                       | 0.2  | 0.3 |  |

Overall, BurstMode showed the highest sensitivity to impedance variation, as it exhibits the largest Max-Min difference for all 9 frequency combinations, if we keep the best option observed with nominal RCL #3 values. This implicitly means that the mandatory calibration phase to select the best burst is completed once for each frequency combination. DCAS-LOC gives the lowest Max-Min difference over 6 of the 9 considered frequency combinations. The average Max-Min difference for DCAS-LOC is also the lowest (0.34%), followed by LOC (0.46%), LOS (0.99%) and BurstMode (2.43%). All these results indicate that DCAS-LOC is overall less sensitive to PDN impedance variation than the other test schemes.

Note that quite similar results were obtained for the sensitivity of each test technique with respect to PDN impedance variation, with 9 other combinations, using slow clocks of 80, 120 and 160MHz, while keeping the fast clocks at 600, 800 and 1000MHz. More specifically, with these 9 new combinations, we obtained the same ranking for the  $M_{R/Tmm}$  worst absolute value as in Table VII: DCAS-LOC had the lowest worst case (1.9%) followed by LOC (7.9%), LOS (7.8%) and BurstMode (11.9%). We also obtained the same ranking, as in Table VIII, expressed as the Max-Min of the  $M_{R/Tmm}$  values, where the worst/average Max-Min values for DCAS-LOC, LOC, LOS and BurstMode were 1.1%/0.51%, 1.8%/0.92%, 2.1%/1.0% and 7.9%/3.0%, respectively.

# XI. DISCUSSION

This paper is about making sure that scan-based delay test schemes are representative of the mission mode, in presence of PDN impedance variation. The first question to answer in this context: What is the mission mode's worst timing margin we want to emulate during test? Consequently, the first part of the paper mostly focuses on the IMPs, which influence the behavior of VDD in mission mode and which may modulate the timing margin in such way that the worst case occurs at specific moments. The results presented in this paper allowed showing that fluctuations on VDD is a necessary but not sufficient condition to get significant timing margin variations. Similarly, it was also shown that the presence of multiple clock domains is a necessary but not sufficient condition to get significant IMPs, which may lead (or not) to significant timing margin variations. One implicit condition to get strong IMPs is to have synchronous designs with clocks from a common source and time aligned with phase/delay-locked loops, such that there are no drift between clocks, only a bounded skew. Another condition is to use frequencies that can directly or indirectly trigger PDN resonance. When these two conditions are met, it is important to verify if IMPs can induce significant timing margin variations.

Even relatively simple circuits were used for our experiments and simulations, we believe that IMPs can occur on more complex circuits. More clock domains does not necessarily mean less timing margin variations. As shown by our experiment results on the Spartan3E-500 FPGA (Table II), going from two clock domains (F0=50MHz, F1=F2=F3=160MHz) to 4 (F0=50MHz, F1=30MHz, F2= 40MHz, F3=160MHz) led to an increase in  $\Delta$ Xmas values, from 228 to 380ps. Note that the Spartan3E-500 is of a reasonable size (500K equivalent gates according to Xilinx [16]).

The skew can also influence the IMP amplitude. The skew on an FPGA such as the Spartan3E-500 can be induced by the clock period jitter, up to  $\pm 150$ ps for this FPGA [16]. In our simulations, the skew was also (at least partially) taken into account as we used 2 different clock distribution trees for the fast and the slow clocks, resulting in an average skew between the 2 clocks of up to 30ps and in a maximum skew of up to 66ps in mission mode, which represents almost 7% of the clock period at 1000MHz.

Finally, note that the impact of IMPs in mission mode adds to the VDD perturbations in the test mode due to the launch and capture pulses. As shown in [12], in a single clock domain situation, this last source of VDD perturbations alone makes test schemes such as BurstMode and SeBoS sensitive to PDN impedance variation. Therefore, even in absence of IMPs, it is important to verify how sensitive to PDN impedance variation the chosen test scheme is.

### XII. CONCLUSIONS

The objectives of this paper were to provide a better understanding of the effect of timing the power supply noise in the context of multiple clock domains and power distribution network impedance variations, and to propose a new scan-based at-speed clocking test scheme that is more robust in that context. To achieve the first objective, we first showed that the current injection caused by switching transistors could be modeled as a Dirac function. We then showed that the IMPs, appearing in the presence of multiple clock domains, were induced by the dependent nature of the transistors. Experimental results were presented to show the timing margin variability due to intermodulation products and that scan-based at-speed testing could be optimistic with respect to mission mode at lower frequencies. These experimental results were also used to validate an improved HSpice simulation model of the PDN. This improved model was in turn used to quantify the timing margin over a wide range of clock frequencies. These simulations led to some counterintuitive observations, notably, that faster-than-at-speed testing does not always lead to further timing margin reduction, and that VDD droop could be misleading when used to estimate timing margins.

The second part of the paper was dedicated to our second objective. We presented a new robust test signal scheme for multiple clock domain ICs. Based on the improved and validated model, our simulations revealed that the new proposed scheme was less sensitive to power distribution network impedance variation and that it provided timing margins closer to those from mission mode, with respect the most popular existing schemes.

## APPENDIX

In this appendix, we provide theoretical equations assuming that the current pulse is modeled as a Dirac function. Using such a model for the current pulse means that we assume that we get the RLC natural response, namely:

$$VDD(t) = 1.2 - K_N * A_p * \sqrt{\left(\frac{1}{c}\right)^2 + \left(\frac{R}{2LC\omega_N}\right)^2} * e^{-\left(\frac{R}{2L}\right)t} \cos\left(\omega_N \left(t - T_{off}\right) + \theta_N\right)$$
(7)

where  $K_N$  is a fitting constant,  $A_p$  the current pulse area,  $T_{off}$  a fitting time offset, and where

$$\omega_N = \sqrt{\frac{1}{LC} - \left(\frac{R}{2L}\right)^2} \tag{8}$$

$$\theta_N = -\tan^{-1}\left(\frac{\kappa}{2L\omega_N}\right) \tag{9}$$

Note also that the *C* parameter was adjusted to take into account the capacitive load of the clock buffers on VDD.

#### REFERENCES

- J. Saxena, et al., "A Case Study of IR-Drop in Structured At-speed Testing," *International Test Conference*, Oct. 2003, pp. 1098-1104.
- [2] M. Tehranipoor and K.M. Butler, "Power Supply Noise: A survey on Effects and Research," *IEEE Design & Test*, vol. 27, no.2, March-April 2010, pp. 51-67.
- [3] T. Zhang and D.M. Walker, "Improved Power Supply Noise Control for Pseudo Functional Test," in *IEEE VLSI Test Symp.*, 2014.
- [4] K. Arabi et al., "Power Supply Noise in SOCs: Metrics, Management, and Measurement," *IEEE Design & Test*, vol. 24, no.3, May-June 2007, pp. 236-244.
- [5] M. Sadi and M. Tehranipoor, "Design of a Network of Digital Sensor Macros for Extracting Power Supply Noise Profile in SoCs," *IEEE Trans.* On VLSI Systems, vol. 24, no.5, May 2016, pp. 1702-1714.
- [6] M. Omana et al., "Low-Cost and High-Resolution Approaches for Power Droop during Launch-On-Shift Scan-Based Logic BIST," *IEEE Trans. On Computers*, vol. 65, no.8, Aug. 2016, pp. 2484-2494.
- [7] J. Rearick an R. Rodgers, "Calibrating Clock Stretch During AC Scan Testing," *International Test Conference*, 2005.
- [8] H. Liu, H. Li, Y. Hu and X. Li, "A Scan-Based Delay Test Method for Reduction of Over-testing," *International Symposium on Electronic Design*, Test & Applications, 2008.
- [9] P. Pant and J. Zelman, "Understanding Power Supply Droop During At-Speed Scan Testing," in *IEEE VLSI Test Symp.*, 2009, pp. 227-232.
- [10] P. Pant et al., "Lessons from At-Speed Scan Deployment on an Intel Itanium Microprocessor," in *International Test Conference*, 2010, pp. 1-10.
- [11] C. Thibeault and J. Larche, "On the impact of multiple clock domains and intermodulation products on test," *IEEE DATA 2012*, Nov. 2012.
- [12] C. Thibeault and A. Louati, "A New Delay Testing Scheme Robust to Power Distribution Network Impedance Variation," *IEEE VLSI Test* Symposium, April 2017.

- [13] B. Nadeau-Dostie, K.Takeshita, and J.-F Cote, "Power-Aware At-Speed Scan Test Methodology for Circuits with Synchronous Clocks," *International Test Conference*, Oct. 2008, pp. 1-10.
- [14] M. Schwartz, Information Transmission, Modulation and Noise, 3<sup>rd</sup> Edition, McGraw Hill, 1980.
- [15] A. Adebisi, A. Sutherland, B. Honary, "Wire Integrity Testing Using Intermodulation Product Processing," *IEEE ISPLC*, May 2008, pp. 213-217.
- [16] Xilinx, "Spartan-3E FPGA Family: Data Sheet," DS312 (v3.8) August 26, 2009, www.xilinx.com.
- [17] Digilent, "Digilent Nexys2 Board Reference Manual," June 2008, www.digilentinc.com.
- [18] J. Larche, "Émulation et comparaison du mode test et du mode fonctionnel des circuits intégrés à horloges multiples," Master's thesis (in French), École de technologie supérieure, 2014.
- [19] M. Alexander, "Power Distribution System (PDS) Design: Using Bypass/Decoupling Capacitors," *Xilinx Application note, XAPP623* (v2.1) February 28, 2005, www.xilinx.com.
- [20] Cypress, "Using Decoupling Capacitors," Application note; March 11, 1999; www.cypress.com
- [21] N. Weste, D.M. Harris, CMOS VLSI Design: A Circuits and Systems Perspective, 4th Edition, Addison Wesley, Reading (Mass.), 2009.
- [22] A. Muhtaroglu, G. Taylor, T. Rahal-Arabi, and K. Callahan, "On-Die Droop Detector for Analog Sensing of Power Supply Noise," *IEEE Symp.* on VLSI Circuits, pp.193-196.
- [23] Jairam S, et. al., "Clock gating for power optimization in ASIC design cycle theory & practice," In Proc. of the 13th International Symposium on Low Power Electronics and Design (ISLPED '08), ACM, New York, NY, USA, 307-308, 2008

**Claude Thibeault** (S'87-M'91–SM'08) received his Ph.D. from Ecole Polytechnique de Montreal, Canada. He is now with

the Electrical Engineering Department of Ecole de technologie superieure, where he serves as full professor. His research interests include design and verification methodologies targeting ASICs and FPGAs, defect and fault tolerance, radiation effects, as well as IC and PCB test and diagnosis. He holds 14 US patents and has published more than 140 journal and conference papers. He has been member of different conference program committees, including VLSI Test Symposium, for which he was program chair in 2010-2012, and general chair in 2014 and 2015.

**Ghyslain Gagnon** (S'03-M'09) received the Ph.D. degree in electrical engineering from Carleton University, Canada in 2008. He is now an Associate Professor at École de technologie supérieure, Montreal, Canada. He is a board member of ReSMiQ and Director of research laboratory LACIME, a group of 13 Professors and nearly 100 highlydedicated students and researchers in microelectronics, digital signal processing and wireless communications. Highly inclined towards research partnerships with industry, his research aims at digital signal processing and machine learning with various applications, from media art to building energy management.