RTL coding guidelines - Doing it right the first time!
- Document in detail interface timing and signal descriptions, clock and reset strategy, modular view of the design and FSMs prior to RTL coding.
- Have a comment “header” for each module with functionality description, version and a log of past changes. This can be managed using a revision control system like CVS.
- Do not include more than one module in one file and the module name should match the filename in the design.
- Be generous while adding comments where necessary - like inputs and outputs.
- Indent your code and use Emacs verilog
mode for connectivity to keep it error-free. Refer to Veripool’s guide on AUTOs.
- Split the design into separate modules based on clock domains.
- Use separate always @ blocks for sequential and combinational logic. Always use non-blocking assignments for sequential logic and blocking assignments for combinational logic.
- Avoid “parallel_case full_case” compiler directives and always add a default clause for case statements.
- Do NOT assign the same variable in more than one always@ block.
- Use “if-else” only for priority encoders and case statements for parallel states.
- Avoid inferring latches in the design, clock gating and instantiating gates in the design to keep it technology independent.
- Register all inputs and outputs in the design to ease timing closure.
- Use dual stage synchronizer cells available in the library than two stage flops for synchronization.
- Use reset synchronizers for asynchronous resets. Add DFT bypass muxes for reset and clock controllability where necessary.
- Avoid combinational loops in the design to aid timing analysis and DFT
- Avoid using clock as data for flop inputs for hassle free DFT insertion.
- Do not mix posedge and negedge flops in the same module where possible.
- Always separate the combinational and sequential logic in a FSM with two always@ blocks.
- Always code with design reuse in mind - For example, FIFOs can be made generic and can be customized by passing parameters while being instantiated.
- Remember the thumb rule - Be conservative in what you transmit and be generous in what you receive
- Parenthesize all operations without depending on the reader to figure out the precedence of operators.
- Add assertions where necessary to aid verification
- Lint your design for syntax/sematic checks and clock-reset policies.
On chip variation and CRPR
Static timing analysis in a chip is largely dependent on Process, Temperature and Voltage variations (PVT), the cell delays and interconnect delays vary largely with these factors. Hence it is necessary to run timing analysis in both worst and best case operating conditions and ensure we meet setup/hold requirements for the chip.
For worst case corners, we specify the chip running at high temperature, low voltage and a slow process (high cap). For best case corner, the voltage is high, temperature is low and a fast process (low cap). Setup is more problematic in slow corner because of larger cell/interconnect delays and hold is more problematic in the fast corner.
Another factor that needs to be considered during timing analysis is on-chip variation (OCV). On a single chip, there can be variations for two exactly similar gates due to other variables during manufacturing process. This variation can be anywhere between 8-12% and needs to be included in timing analysis for a more accurate and foolproof picture.
To add OCV analysis in Synopsys Primetime, we use timing derate factor for min/max cases (8-12%) as shown below. This specifies that the min paths can be faster than the max paths by 40% !
set_timing_derate –min 0.8 –max 1.2
Next, we use the “on_chip_variation” switch as shown below to enable OCV
set_operating_conditions -analysis_type on_chip_variation
However, if you look at the reports carefully, you will notice that Primetime is overtly pessimistic i.e. if there is a common branch of clock tree between launch and capture flops, Primetime varies this clock tree delay depending on OCV (for example, for setup analysis, it will slow down the common clock tree branch delay for launch flop and will fasten the same branch to capture flop!)
To counter this, Clock Reconvergence Pessimism Removal (CRPR) feature is added in Primetime. CRPR is enabled by using the command below
set timing_remove_clock_reconvergence_pessimism true
By enabling this feature, Primetime looks at the common logic in clock and data path, removes the difference between their max and min delays thus projecting a more realistic picture.
For more details on OCV and CRPR, please refer to the paper at the link below.
Sphere: Related Content
Logic BIST Design
Need for Logic Built-in Self Test (BIST)
Traditional scan requires large number of vectors to sensitize the design, runs at a maximum frequency of 50 MHz and is limited by number of channels supported by the tester. All these add to tester time that varies from 25 to 50 cents per second. Many designs integrate Logic BIST to overcome these limitations and reduce cost of testing.
Logic BIST, in brief words, involves driving control signals from an in-built controller, generating pseudo-random patterns on the chip, compact the responses from these patterns on the chip - All these occurs at-speed reducing the interface to the tester, the tester memory and also tester time.
Logic BIST Architecture
Logic BIST Architecture
The figure above shows the architecture of Logic BIST that is based on the traditional scan based architecture (known as STUMPS model). The primary instances in this model are:
- Pseudo-random Pattern Generator (PRPG) - this is implemented using linear feedback shift registers (LFSR) to generate pseudo-random patterns to stimulate the design. The LFSR is “maximal length” by nature which means that it visits each and every state before repeating the sequence.
- The Phase shifter block ensures that a large number of scan chains in the design are driven using a short LFSR by using phase-shifting techniques. This phase shifting also removes any inter-channel dependence between input channels. There are muxes at the input of the scan chains to select either traditional scan inputs (muxed-scan) or from PRPG to achieve more fault coverage.
- Space Compactor compresses the output of these scan chains using XOR logic before feeding the compressed outputs to Multiple Input Signature Register (MISR). The MISR outputs are then compared internally with an on-chip reference signature or are scanned out of primary pins.
- A BIST controller that controls the generation of clock control and scan enable signals apart from counters to track the shift cycles. A TAP interface is also integrated in the controller to initiate logic BIST through JTAG.
The shift pattern is determined by the longest scan chain path and also the number of capture clocks (usually one) in the design. Patterns from PRPG are shifted into the scan chains while simultaneously being compressed at the other end into MISR for better utilization.
The design requirements are stringent - no unknown “X” sources (like memories, non-scannable flip-flops), design should be pseudo random pattern testable with minimal area overhead. Any “X” sources can cause corruption of MISR outputs and hence control and test points need to be added in the design. The advantages far outweigh the disadvantages for complex multi-million gate designs - LogicVision’s LogicBIST and Mentor’s TestKompress are two well-known DFT tools for logic BIST.
Sphere: Related Content