#### **EE895KR**

#### **Advanced VLSI Design**

#### Kaushik Roy

Purdue University Dept. of ECE

kaushik@purdue.edu





## **Course Overview**

- Targeted for graduate students who have already taken basic VLSI design classes
- Real world challenges and solutions in designing high-performance and low-power circuits
- Relations to VLSI Design
  - Recent developments in digital IC design
  - Project oriented
  - Student participation: class presentation





## Prerequisite

- MOS VLSI Design or equivalent
  - MOS transistor
  - Static, dynamic logic
  - Adder
- Familiarity with VLSI CAD tools
  - Magic or Cadence: LVS, DRC
  - HSPICE
- Basic knowledge on solid-state physics





## **Class Materials**

- Lecture notes: primary reference
- K. Roy, S. Prasad, Low Power CMOS VLSI Circuit Design, John Wiley
- A. Chandrakasan, W. Bowhill, F. Fox, *Design of High-Performance Microprocessor Circuits*, IEEE Press, 2001.
- Y. Taur, T. Ning, *Fundamentals of Modern VLSI Devices*, Cambridge University Press, 2002.
- J. Rabaey, A. Chandrakasan, B. Nikolic, *Digital Integrated Circuits: A Design Perspective*, Prentice Hall, 2nd edition, 2003. (prerequisite)





## **Class Organization**

- One exam (40% of overall grade)
- Term-long project (60%)
  - Proposal (5%)
  - Midterm presentation (15%) background material and proposed work
  - Final presentation (20%)
  - Final report (20%)





#### **CAD** Tools

- Cadence
  - Schematic editor, layout editor, DRC, LVS
- HSPICE, awaves
- Technology files
  - TSMC 0.18µm, BPTM 70nm, ...
- Synopsys design compiler, library compiler
- Taurus-device, Taurus-medici
- Everyone should have some experience with these tools





## **Term Project**

- Single person project
- Proposal (~week 3)
  - 2 pages
  - Topic, problem statement, research plan, references
- Midterm presentation (~week 7)
  - 15 mins
  - Literature survey
  - Off campus students can give presentations over the phone
- Final presentation (early December)
  - 20 mins
  - Background, final results, contributions
- Final report (Dec. 10)
  - Publishable quality





#### **Project Topic**

- Students pick the research topic they want to work on
- After the literature survey, choose a paper that you would like to evaluate yourself
- Has to be on digital VLSI circuit DESIGN
  - Op-amp design alone is not acceptable
  - Op-amp design for digital applications is acceptable
- Show the paper's claim using your own simulations
- Your contribution must be clearly shown at the end
  - Improve previous design
  - New circuit, modeling technique
  - Show limitation of previous techniques
- Talk to the instructor in case you need help





#### How to Find a Project Topic?

- Conferences
  - International Solid-State Circuits Conference (ISSCC, top conference!): slides posted on IEEExplore
  - Symposium on VLSI Circuits (VLSIC), DAC, ICCAD
  - Custom Integrated Circuits Conference (CICC)
- Journal
  - IEEE TVLSI, IEEE TCAD, IEEE TED
  - IEEE Journal of Solid-State Circuits (JSSC)
  - Intel Technology Journal
  - IBM Journal on R & D





#### How to Find a Project Topic?

- Funding agencies
  - Research needs document (www.src.org)
- Presentation
  - University of Michigan VLSI seminar series (www.eecs.umich.edu/vlsi\_seminar/)
  - Design automation conference (www.dac.com)
- Pick a recent issue in VLSI design (< 5 years)</li>
- I suggest you start doing the literature survey ASAP (deadline coming up in 3 weeks)





10

## Acknowledgements

- Prof. Chris Kim
- Intel circuit research labs (S. Borkar and many others)
- IBM
- Copy right 2002 J. Rabaey et al.





11

#### Academic Misconduct

- Students caught engaging in an academically dishonest practice will receive a failing grade for the course.
- University policy on academic dishonesty will be followed strictly.





## **Course Topics**

- Scaling issues
- High performance design
  - High performance logic family, clocking strategies, interconnects
- Low power design
  - Low voltage designs, leakage control techniques, circuit/device/technology issues, low power SRAM
- Variation tolerant design
  - Process compensating techniques
- Power delivery, interconnect, reliability
- Bulk and SOI





# A physical system as a computing medium

- We need to create a bit first. Information processing always requires physical carrier, which are material particles.
- *<u>First</u>* requirement to physical realization of a bit implies creating *distinguishable* states within a system of such material particles.
- <u>The second</u> requirement is *conditional* change of state.
- The properties of *distinguishability* and *conditional change of state* are <u>two fundamental properties</u> of a material subsystem to represent information. These properties can be obtained by creating *energy barriers* in a material system.





#### **Particle Location is an Indicator of State**









#### **Two-well bit**









# Barrier engineering in semiconductors

By doping, it is possible to create a built-in field and energy barriers of controllable height and length within semiconductor. It allows one to achieve conditional complex electron transport between different energy states inside semiconductors that is needed in the physical realization of devices for information processing.







#### Kroemer's Lemma of Proven Ignorance

 If in discussing a semiconductor problem, you cannot draw an Energy-Band-Diagram, this shows that you don't know what are you talking about

 If you can draw one, but don't, then your audience won't know what are you talking about





#### Moore's Law



 Intel founder and chairman Gordon Moore predicted in 1965 that the number of transistors on a chip will double every 18-24 months





#### **Transistor Scaling**



 Constant E-field scaling: voltage and dimensions (both horizontal and vertical) are scaled by the same factor k, (~1.4), such that the electrical field remains unchanged.





## **Technology Scaling**

$$Dimensions \xrightarrow{scale} 0.7, \ V_{dd} \xrightarrow{scales} \beta, \ V_t \xrightarrow{scales} \beta$$

$$I = \frac{kW}{T_{ox}} (V_{dd} - V_t) \xrightarrow{scales} \frac{0.7}{0.7} \times \beta = \beta$$

$$D = \frac{CV_{dd}}{I} \xrightarrow{scales} \frac{0.7 \times \beta}{\beta} = 0.7 \quad (30\% \text{ delay reduction})$$

$$E = C V_{dd}^{2} \xrightarrow{\text{scales}} 0.7 \beta^{2}$$





#### **IC Frequency & Power Trends**

Clock 1000 1000 Q Pentium Power frequency Processor (MHz) improves 100 800 Pentium () 50% Processor 10 **Chip Power** 600 Frequency Gate delay improves 486DX 1 400 ~30% CPU 386 Power 0.1 200 Frequency increases 50% 0.01 Ω Power = 1.0 8.0 0.6 .35 .25 .18  $C_1 V^2 f$ **Technology Generation (µm)** 

▲ Active switched capacitance "C<sub>L</sub>" is increasing.



# Vdd vs. Vt scaling

- Recently: constant e-field scaling, aka voltage scaling
- V<sub>CC</sub> ≪ 1V
- V<sub>CC</sub> & modest V<sub>T</sub> scaling
- Loss in gate overdrive  $(V_{CC}-V_T)$



1.0 0.8 0.6 .35 .25 .18 1.4

**Technology Generation (µm)** ▲ Voltage scaling is good for controlling IC's active power, but it requires aggressive  $V_{T}$  scaling for high performance

Vcc





## Delay



Long Channel MOSFET



 C. Hu, "Low Power Design Methodologies," Kluwer Academic Publishers, p. 25.

Performance significantly degrades when  $V_{DD}$  approaches  $3V_{T}$ .





#### $V_{\mathsf{T}}$ Scaling: $V_{\mathsf{T}}$ and $I_{\mathsf{OFF}}$ Trade-off



 $\clubsuit$  As V<sub>T</sub> decreases, sub-threshold leakage increases

↓ Leakage is a barrier to voltage scaling

![](_page_24_Picture_4.jpeg)

![](_page_24_Picture_5.jpeg)

#### **Constant Field Scaling**

|                        | Device and circuit parameters                               | Factor           |
|------------------------|-------------------------------------------------------------|------------------|
| Scaling<br>assumptions | Device dimensions (t <sub>ox</sub> , L, W, X <sub>j</sub> ) | 1/k              |
|                        | Doping concentration (N <sub>a</sub> , N <sub>d</sub> )     | k                |
|                        | Voltage (V)                                                 | 1/k              |
| Device<br>parameters   | Electric field (E)                                          | 1                |
|                        | Capacitance (C=εA/t)                                        | 1/k              |
|                        | Current (I)                                                 | 1/k              |
|                        | Channel resistance (R <sub>ch</sub> )                       | 1                |
| Circuit<br>parameters  | Delay (CV/I)                                                | 1/k              |
|                        | Power (VI)                                                  | 1/k <sup>2</sup> |
|                        | Switching energy (CV <sup>2</sup> )                         | 1/k <sup>3</sup> |
|                        | Circuit density (1/A)                                       | k <sup>2</sup>   |
|                        | Power density (P/A)                                         | 1                |

![](_page_25_Picture_2.jpeg)

![](_page_25_Picture_3.jpeg)

#### **Scaling in the Vertical Dimension**

![](_page_26_Figure_1.jpeg)

- Transistor V<sub>t</sub> rolls off as the channel length is reduced
- Shallow junction depth reduces V<sub>t</sub> roll-off
- However, sheet resistance increases

![](_page_26_Picture_5.jpeg)

![](_page_26_Picture_6.jpeg)

#### **Scaling in the Vertical Dimension**

![](_page_27_Figure_1.jpeg)

- Vertical dimension scales less than horizontal
- Aggravates short channel effect (V<sub>t</sub> roll-off)

![](_page_27_Picture_4.jpeg)

#### **Constant Voltage Scaling**

|                        | Device and circuit parameters                               | Factor                |
|------------------------|-------------------------------------------------------------|-----------------------|
| Scaling<br>assumptions | Device dimensions (t <sub>ox</sub> , L, W, X <sub>j</sub> ) | 1/k                   |
|                        | Doping concentration (N <sub>a</sub> , N <sub>d</sub> )     | k                     |
|                        | Voltage (V)                                                 | 1                     |
| Device<br>parameters   | Electric field (E)                                          | k                     |
|                        | Capacitance (C=εA/t)                                        | 1/k                   |
|                        | Current (I)                                                 | k                     |
|                        | Channel resistance (R <sub>ch</sub> )                       | 1/k                   |
| Circuit<br>parameters  | Delay (CV/I)                                                | 1/k <sup>2</sup>      |
|                        | Power (VI)                                                  | k                     |
|                        | Switching energy (CV <sup>2</sup> )                         | 1/k                   |
|                        | Circuit density (1/A)                                       | <b>k</b> <sup>2</sup> |
|                        | Power density (P/A)                                         | k <sup>3</sup>        |

![](_page_28_Picture_2.jpeg)

![](_page_28_Picture_3.jpeg)

#### **Constant Voltage Scaling**

- More aggressive scaling than constant field
- Limitations
  - Reliability problems due to high field
  - Power density increases too fast
- Both constant field and constant voltage scaling have been followed in practice
- Field and power density has gone up as a byproduct of high performance, but till now designers are able to handle the problems

![](_page_29_Picture_7.jpeg)

![](_page_29_Picture_8.jpeg)

## **ITRS Roadmap**

| Year                                        | 2001 | 2003 | 2005 | 2007 | 2010 | 2013  | 2016  |
|---------------------------------------------|------|------|------|------|------|-------|-------|
| DRAM <sup>1</sup> / <sub>2</sub> pitch [nm] | 130  | 100  | 80   | 65   | 45   | 32    | 22    |
| MPU transistors/chip                        | 97M  | 153M | 243M | 386M | 773M | 1.55G | 3.09G |
| Wiring levels                               | 8    | 8    | 10   | 10   | 10   | 11    | 11    |
| High-perf. phys. gate [nm]                  | 65   | 45   | 32   | 25   | 18   | 13    | 9     |
| High-perf. VDD [V]                          | 1.2  | 1.0  | 0.9  | 0.7  | 0.6  | 0.5   | 0.4   |
| Local clock [GHz]                           | 1.7  | 3.1  | 5.2  | 6.7  | 11.5 | 19.3  | 28.8  |
| High-perf. power [W]                        | 130  | 150  | 170  | 190  | 218  | 251   | 288   |

 International Technology Roadmap for Semiconductors 2002 projection (http://public.itrs.net/)

![](_page_30_Picture_3.jpeg)

![](_page_30_Picture_4.jpeg)

#### **Transistor Scaling**

![](_page_31_Figure_1.jpeg)

- 90nm is in production, 65nm in research phase
- New technology generation introduced every 2-3 years

![](_page_31_Picture_4.jpeg)

![](_page_31_Picture_5.jpeg)

#### **Cost per Transistor**

![](_page_32_Figure_1.jpeg)

- You can buy 10M transistors for a buck
- They even throw in the interconnect and package for free

![](_page_32_Picture_4.jpeg)

![](_page_32_Picture_5.jpeg)

#### **Transistors Shipped Per Year**

![](_page_33_Figure_1.jpeg)

Today, there are about 100 transistors for every ant
Gordon Moore, ISSCC '04

![](_page_33_Picture_3.jpeg)

![](_page_33_Picture_4.jpeg)

#### **Transistors per Chip**

![](_page_34_Figure_1.jpeg)

- 1.7B transistors in Montecito (next generation Itanium)
- Most of the devices used for on-die cache memory

![](_page_34_Picture_4.jpeg)

![](_page_34_Picture_5.jpeg)

## **Moore's Wrong Prediction**

![](_page_35_Picture_1.jpeg)

![](_page_35_Picture_2.jpeg)

![](_page_35_Picture_3.jpeg)

![](_page_36_Figure_0.jpeg)

• 30% higher frequency every new generation

![](_page_36_Picture_2.jpeg)

![](_page_36_Picture_3.jpeg)

![](_page_37_Figure_0.jpeg)

- ~15% larger die every new generation
- This means more than 2X increase in transistors per chip

![](_page_37_Picture_3.jpeg)

![](_page_37_Picture_4.jpeg)

# **Supply Voltage Scaling**

![](_page_38_Figure_1.jpeg)

• Supply voltage is reduced for active power control  $P_{active} \propto C V_{dd}^2 f$ 

![](_page_38_Picture_3.jpeg)

![](_page_38_Picture_4.jpeg)

#### 4 Decades of Transistor Scaling: Itanium 2 Processor

![](_page_39_Picture_1.jpeg)

- 130nm process
- 410M transistors
- 374mm<sup>2</sup> die size
- 6MB on-die L3 cache
- 1.5GHz at 1.3V
- 6.4GB/s 400MT/s 4-way bus interface
- System compatible with existing Itanium 2 platforms
- Extensive RAS, DFT and DFM features

![](_page_39_Picture_10.jpeg)

![](_page_39_Picture_11.jpeg)

#### **Power Density**

![](_page_40_Figure_1.jpeg)

- High-end microprocessors: Packaging, cooling
- Mobile/handheld applications: Short battery life

![](_page_40_Picture_4.jpeg)

![](_page_40_Picture_5.jpeg)

#### **Active and Leakage Power**

![](_page_41_Figure_1.jpeg)

• Transistors are becoming dimmers

![](_page_41_Picture_3.jpeg)

![](_page_41_Picture_4.jpeg)

#### Leakage Power Crawling Up in Itanium 2

![](_page_42_Figure_1.jpeg)

• Transistor leakage is perhaps the biggest problem

![](_page_42_Picture_3.jpeg)

![](_page_42_Picture_4.jpeg)

#### Leakage Power versus Temp.

![](_page_43_Figure_1.jpeg)

 Leakage power is problematic in active mode for high performance microprocessors

![](_page_43_Picture_3.jpeg)

![](_page_43_Picture_4.jpeg)

![](_page_44_Figure_0.jpeg)

- Destructive positive feedback mechanism
- Leakage increases exponentially with temperature
- May destroy the test socket  $\rightarrow$  thermal sensors required

![](_page_44_Picture_4.jpeg)

![](_page_44_Picture_5.jpeg)

#### **Gate Oxide Thickness**

![](_page_45_Figure_1.jpeg)

- Electrical t<sub>ox</sub> > Physical t<sub>ox</sub>
- Due to gate depletion and carrier quantization in the channel

![](_page_45_Picture_4.jpeg)

![](_page_45_Picture_5.jpeg)

## Gate Tunneling Leakage

![](_page_46_Figure_1.jpeg)

- MOSFET no longer have infinite input resistance
- Impacts both power and functionality of circuits

![](_page_46_Picture_4.jpeg)

![](_page_46_Picture_5.jpeg)

#### **Process Variation in Microprocessors**

![](_page_47_Figure_1.jpeg)

- Fast chips burn too much power
- Slow chips cannot meet the frequency requirement

![](_page_47_Picture_4.jpeg)

![](_page_47_Picture_5.jpeg)

#### **Process Variation in Transistors**

![](_page_48_Figure_1.jpeg)

- More than 2X variation in I<sub>on</sub>, 100X variation I<sub>off</sub>
- Within-dies, die-to-die, lot-to-lot

![](_page_48_Picture_4.jpeg)

![](_page_48_Picture_5.jpeg)

![](_page_49_Figure_0.jpeg)

- Intrinsic parameter variation (static)
  - Channel length, random dopant fluctuation
- Environmental variation (dynamic)
  - Temperature, supply variations

![](_page_49_Picture_5.jpeg)

![](_page_49_Picture_6.jpeg)

## Sub-wavelength Lithography

![](_page_50_Figure_1.jpeg)

![](_page_50_Picture_2.jpeg)

![](_page_50_Picture_3.jpeg)

## Line Edge/Width Roughness

![](_page_51_Figure_1.jpeg)

I<sub>off</sub> and I<sub>dsat</sub> impacted by LER and LWR

![](_page_51_Picture_3.jpeg)

![](_page_51_Picture_4.jpeg)

## **Random Dopant Fluctuation**

![](_page_52_Figure_1.jpeg)

V<sub>t</sub> variation caused by non-uniform channel dopant distribution

![](_page_52_Picture_3.jpeg)

![](_page_52_Picture_4.jpeg)

## **Supply Voltage Integrity**

![](_page_53_Figure_1.jpeg)

- IR noise due to large current consumption
- Ldi/dt noise due to new power reduction techniques (clock gating, power gating, body biasing) with power down mode

![](_page_53_Picture_4.jpeg)

![](_page_53_Picture_5.jpeg)

# Supply Voltage Integrity

- Degrades circuit performance
- Supply voltage overshoot causes reliability issues
- Power wasted by parasitic resistance causes self-heating
- V<sub>dd</sub> fluctuation should be less than 10%

![](_page_54_Figure_5.jpeg)

#### **Courtesy IBM**

![](_page_54_Picture_7.jpeg)

![](_page_54_Picture_8.jpeg)

## **Productivity Gap**

![](_page_55_Figure_1.jpeg)

- Design complexity surpasses manpower
- Effective CAD tools, memory dominated chips

![](_page_55_Picture_4.jpeg)

![](_page_55_Picture_5.jpeg)

## Lithography Tool Cost

![](_page_56_Figure_1.jpeg)

#### • What will end Moore's law, economics or physics?

![](_page_56_Picture_3.jpeg)

![](_page_56_Picture_4.jpeg)

#### **Interconnect Scaling**

- Global interconnects get longer due to larger die size
- Wire scaling increases R, L and C
- Example: local vs. global interconnect delay

![](_page_57_Picture_4.jpeg)

![](_page_57_Picture_5.jpeg)

## **Interconnect Delay Problem**

![](_page_58_Figure_1.jpeg)

- Local interconnect has sped up (shorter wires)
- Global interconnect has slowed down (RC doesn't scale)

![](_page_58_Picture_4.jpeg)

![](_page_58_Picture_5.jpeg)

![](_page_59_Figure_0.jpeg)

- Local wires have high density to accommodate the increasing number of devices
- Global wires have low RC (tall, wide, thick, scarce wires)

![](_page_59_Picture_3.jpeg)

#### Interconnect distribution scaling trends

 $\cdot$  RC/µm scaling trend is only one side of the story...

![](_page_60_Figure_2.jpeg)

#### **Power Delivery & Distribution Challenges**

- High-end microprocessors approaching > 10 GHz
  - How to deliver and distribute ~100A at < 1V for < \$20!
  - On-die power density >>> hot-plate power density
    - crossover happened back in 0.6µm technology!
  - di/dt noise only worsening with scaling: drivers are one of the sources.

![](_page_61_Figure_6.jpeg)

#### **Example multi-layer system**

![](_page_62_Figure_1.jpeg)

#### **Cross Talk Noise**

![](_page_63_Picture_1.jpeg)

- As wires are brought closer with scaling, capacitive coupling becomes significant
- Adjacent wires on same layer have stronger coupling

![](_page_63_Picture_4.jpeg)

![](_page_63_Picture_5.jpeg)

#### **Cross Talk Noise**

![](_page_64_Figure_1.jpeg)

- Multiple aggressors multiple victims possible
- Cross talk noise can cause logic faults in dynamic circuits

![](_page_64_Picture_4.jpeg)

![](_page_64_Picture_5.jpeg)

## **Cross Talk and Delay**

- Capacitive cross talk can affect delay
- If aggressor(s) switch in opposite direction, effective coupling capacitance is doubled
- On the other hand, if aggressor(s) switch in the same direction, Cc is eliminated
- Significant difference in RC delay depending on adjacent switching activity

![](_page_65_Picture_5.jpeg)

![](_page_65_Picture_6.jpeg)

![](_page_65_Picture_7.jpeg)

## Soft Error In Storage Nodes

![](_page_66_Figure_1.jpeg)

- Soft errors are caused by
  - Alpha particles from package materials
  - Cosmic rays from outer space

![](_page_66_Picture_5.jpeg)

![](_page_66_Picture_6.jpeg)

# Soft Error In Storage Nodes

- Error correction code
- Shielding
- SOI
- Radiation-hardened cell

![](_page_67_Figure_5.jpeg)

![](_page_67_Picture_6.jpeg)

![](_page_67_Picture_7.jpeg)

## More Roadblocks

- Memory stability
- Long term reliability
- Mixed signal design issues
- Mask cost
- Testing multi-GHz processors
- Skeptics: Do we need a faster computer?

Eventually, it all boils down to economics

![](_page_68_Picture_8.jpeg)

![](_page_68_Picture_9.jpeg)

## Summary

- Digital IC Business is Unique
  - Things Get Better Every Few Years
  - Companies Have to Stay on Moore's Law Curve to Survive
- Benefits of Transistor Scaling
  - Higher Frequencies of Operation
  - Massive Functional Units, Increasing On-Die Memory
  - Cost/MIPS Going Down
- Downside of Transistor Scaling
  - Power (Dynamic and Static)
  - Process Variation
  - Design/Manufacturing Cost

![](_page_69_Picture_12.jpeg)

![](_page_69_Picture_13.jpeg)