THE VLSI HOMEPAGE

A Practical Guide to VLSI Design and Verification..

On chip variation and CRPR

Posted in Static Timing Analysis by Nigam on the September 27th, 2007

Static timing analysis in a chip is largely dependent on Process, Temperature and Voltage variations (PVT), the cell delays and interconnect delays vary largely with these factors. Hence it is necessary to run timing analysis in both worst and best case operating conditions and ensure we meet setup/hold requirements for the chip.

For worst case corners, we specify the chip running at high temperature, low voltage and a slow process (high cap). For best case corner, the voltage is high, temperature is low and a fast process (low cap). Setup is more problematic in slow corner because of larger cell/interconnect delays and hold is more problematic in the fast corner.

Another factor that needs to be considered during timing analysis is on-chip variation (OCV). On a single chip, there can be variations for two exactly similar gates due to other variables during manufacturing process. This variation can be anywhere between 8-12% and needs to be included in timing analysis for a more accurate and foolproof picture.

To add OCV analysis in Synopsys Primetime, we use timing derate factor for min/max cases (8-12%) as shown below. This specifies that the min paths can be faster than the max paths by 40% !

set_timing_derate –min 0.8 –max 1.2

Next, we use the “on_chip_variation” switch as shown below to enable OCV

set_operating_conditions -analysis_type on_chip_variation

However, if you look at the reports carefully, you will notice that Primetime is overtly pessimistic i.e. if there is a common branch of clock tree between launch and capture flops, Primetime varies this clock tree delay depending on OCV (for example, for setup analysis, it will slow down the common clock tree branch delay for launch flop and will fasten the same branch to capture flop!)

To counter this, Clock Reconvergence Pessimism Removal (CRPR) feature is added in Primetime. CRPR is enabled by using the command below

set timing_remove_clock_reconvergence_pessimism true

By enabling this feature, Primetime looks at the common logic in clock and data path, removes the difference between their max and min delays thus projecting a more realistic picture.

For more details on OCV and CRPR, please refer to the paper at the link below.

On Chip Variation Analysis

 

 

 

Sphere: Related Content

Logic BIST Design

Posted in DFT by Nigam on the September 24th, 2007

Need for Logic Built-in Self Test (BIST)

Traditional scan requires large number of vectors to sensitize the design, runs at a maximum frequency of 50 MHz and is limited by number of channels supported by the tester. All these add to tester time that varies from 25 to 50 cents per second. Many designs integrate Logic BIST to overcome these limitations and reduce cost of testing.

Logic BIST, in brief words, involves driving control signals from an in-built controller, generating pseudo-random patterns on the chip, compact the responses from these patterns on the chip - All these occurs at-speed reducing the interface to the tester, the tester memory and also tester time.

Logic BIST Architecture

Logic BIST Architecture

Logic BIST Architecture

The figure above shows the architecture of Logic BIST that is based on the traditional scan based architecture (known as STUMPS model). The primary instances in this model are:

  • Pseudo-random Pattern Generator (PRPG) - this is implemented using linear feedback shift registers (LFSR) to generate pseudo-random patterns to stimulate the design. The LFSR is “maximal length” by nature which means that it visits each and every state before repeating the sequence.
  • The Phase shifter block ensures that a large number of scan chains in the design are driven using a short LFSR by using phase-shifting techniques. This phase shifting also removes any inter-channel dependence between input channels. There are muxes at the input of the scan chains to select either traditional scan inputs (muxed-scan) or from PRPG to achieve more fault coverage.
  • Space Compactor compresses the output of these scan chains using XOR logic before feeding the compressed outputs to Multiple Input Signature Register (MISR). The MISR outputs are then compared internally with an on-chip reference signature or are scanned out of primary pins.
  • A BIST controller that controls the generation of clock control and scan enable signals apart from counters to track the shift cycles. A TAP interface is also integrated in the controller to initiate logic BIST through JTAG.

The shift pattern is determined by the longest scan chain path and also the number of capture clocks (usually one) in the design. Patterns from PRPG are shifted into the scan chains while simultaneously being compressed at the other end into MISR for better utilization.

The design requirements are stringent - no unknown “X” sources (like memories, non-scannable flip-flops), design should be pseudo random pattern testable with minimal area overhead. Any “X” sources can cause corruption of MISR outputs and hence control and test points need to be added in the design. The advantages far outweigh the disadvantages for complex multi-million gate designs - LogicVision’s LogicBIST and Mentor’s TestKompress are two well-known DFT tools for logic BIST.

Sphere: Related Content

DFT - Traditional Scan

Posted in DFT by Nigam on the September 22nd, 2007

Traditional scan based designs employ either Muxed-scan technique or Level Sensitive Scan Design (LSSD) techniques to achieve test coverage. In scan based designs, the registers are hooked up to form serial shift register chains - this aids in capturing all the combinational logic faults between two pipelined registers.

There is a shift phase during which the ATE pattern is shifted serially through the scan chain from the IO pins. Once the ATE pattern is shifted through the scan chain, there is a capture phase that allows the flops in the scan chain to capture the combinational logic output at their input pins. Following this capture phase, the pattern is shifted out serially and compared with the expected vector from ATPG.

Each clock domain can have it’s flops stitched into a single/multiple scan chains, based on the number of flops in the clock domain - stitching multiple scan chains is an advantage as it reduces tester time since we can parallely load all the scan chains. Usually mixing clock domains or posedge and negedge flops in a scan chain is not recommended as it can cause timing issues. Use of lock-up latches is advised if crossing from posedge to negedge flops in a scan chain.

Advantages of scan based design are several - high fault coverage with moderate increase in logic, the entire insertion is automated. The main disadvantage of scan based testing is that it runs at low speed typically 50 MHz - this is very slow for high speed designs and where at-speed testing is critical.

Muxed scan

The figure below shows a muxed-scan flop and associated waveform - the D pin is the normal functional input, SI is the scan input and SE is the scan enable to shift in and shift out data out of the flop. The Q output of the flop is connected to SI pin of the next flop in the scan chain. During the capture phase, the Q latches the D input and the data is shifted out.

Muxed Scan flip-flop

Muxed scan flip-flop

LSSD scan

A schematic of LSSD scan flop and waves is shown in the figure below. The relationship between clocks is also shown - the SCK1 and SCK2 are active during shift phase while the CLK is active during the capture phase.

LSSD scan flop

LSSD scan flop

Differences between Muxed scan and LSSD are several - in muxed scan, the data and test paths are the same i.e. the test path is muxed with the functional path and can add additional logic on the functional path making timing closure harder. Muxed scan flops are smaller in area and are faster but the same functional clock is used for shift/capture and can cause shift violations. In contrast, the LSSD scan paths are different from functional paths, have two different non-overlapping clocks for shifting data in and out and one capture clock. LSSD flops are larger in size but the timing closure is easier as we never run into shift violations.

Main disadvantage of scan based testing is that it runs at low speed typically 50 MHz - this is very slow for high speed designs and where at-speed testing is critical.

Sphere: Related Content

« Previous PageNext Page »

Close
E-mail It