April 16, 2002: VLSI symposia abstracts
Abstracts from the Intel presentations.
A
4GHz 130nm Address Generation Unit with 32-bit Sparse-tree Adder Core Sanu
Mathew, Mark Anders, Ram K. Krishnamurthy and Shekhar Borkar Circuits
Research, Intel Labs, Intel Corporation, Hillsboro, OR 97124, USA,
sanu.k.mathew@intel.com This
paper describes a 32-bit Address Generation Unit (AGU) designed for 4GHz
operation in 1.2V, 130nm
technology. The AGU utilizes a 152ps dual-Vt
sparse-tree
adder core to achieve 20% delay reduction, 80% lower interconnect density and
a low (1%) active energy leakage component. The semi-dynamic implementation
enables an average energy profile similar to static CMOS, with good sub-130nm
scaling trend. Dual
Supply Voltage Clocking for 5GHz 130nm Integer Execution Core Ram
K. Krishnamurthy, Steven Hsu, Mark Anders, Brad Bloechel, Bhaskar Chatterjee*,
Manoj Sachdev*, Shekhar Borkar Circuits
Research, Intel Labs, Intel Corporation, Hillsboro, OR 97124, USA, ramk@ichips.intel.com This
paper describes dual-Vcc clocking
on a 1.2V, 5GHz integer execution core fabricated in 130nm CMOS to achieve up
to 71% measured clock power (including 15% active leakage) reduction. A
write-port style pass-transistor latch and split-output level-converting local
clock buffer are described for robust, DC power free low-Vcc
clock operation. A
4.5GHz 130nm 32KB L0 Cache with a Self Reverse Bias Scheme Steven
K. Hsu, Atila Alvandpour, Sanu Mathew, Shih-Lien Lu, Ram K. Krishnamurthy,
Shekhar Borkar Circuits
Research, Intel Labs, Intel Corporation, Hillsboro, OR 97124, USAsteven.k.hsu@intel.com This
paper describes a 32KB dual-ported L0 cache for 4.5GHz operation in 1.2V,
130nm CMOS. The local bitline uses a Self Reverse Bias scheme to achieve
?220mV access transistor underdrive without external bias voltage or
gate-oxide overstress. 11% faster read delay and 104% higher DC robustness
(including 7x measured active leakage reduction) is achieved over optimized
high-performance dual-Vt scheme.
Designing
a 3GHz, 130nm, Intel® Pentium®4 Processor Daniel
Deleganes, Jonathon Douglas, Badari Kommandur, Marek Patyra Intel Architecture
Group, 2501 NW 229 th Ave.
MS RA2-401 Hillsboro, OR, 97124, USA The design of an IA32 processor fabricated on state-of-the art 130nm CMOS process with improved six layers of dual-damascene copper metallization is described. Engineering an IA32 processor for server, desktop, and mobile platforms, particularly meeting diverse power & thermal constraints, poses numerous challenges. This presentation focuses on methods applied to achieve high frequency and low power on the same chip, particularly, the use of Dual Vt process, clock skew design, and thermal management techniques.
Forward
Body Bias for Microprocessors in 130nm Technology Generation and Beyond Ali Keshavarzi, Siva Narendra, Bradley Bloechel, Shekhar Borkar and Vivek De Microprocessor Research, Intel Labs, Hillsboro, OR, USA Device
and testchip measurements show that forward body bias (FBB) can be used
effectively to improve performance and reduce complexity of a 130nm dual-VT
technology, reduce leakage
power during burn-in and standby, improve circuit delay and robustness, and
reduce active power. FBB allows performance advantages of low temperature
operation to be realized fully without requiring transistor redesign, and also
improves VT variations,
mismatch, and gm x ro
product.
A 6GHz, 16Kbytes L1 Cache in a 100nm Dual-VT Technology Using a Bitline Leakage Reduction (BLR) Technique Yibin Ye, Muhammad Khellah, Dinesh Somasekhar, Ali Farhang and Vivek De Microprocessor Research, Intel Labs, Hillsboro, OR, USA A
L1 cache testchip with dual-VT
cell and a bitline leakage
reduction (BLR) technique has been implemented in a 100nm dual-VT
technology. Area of a 2KBytes
array is 263.m X 204.m, which is virtually the same as the best conventional
design with high-VT cell.
BLR eliminates impacts of bitline leakage on performance and noise margin with
minimal area overhead. Bitline delay improves by 23%, thus enabling 6GHz
operation. Energy consumption per cycle is 15% higher.
A
Leakage-Tolerant Dynamic Register File Using Leakage Bypass with Stack Forcing
(LBSF) and Source Follower NMOS (SFN) Techniques Stephen
Tang, Steven Hsu, Yibin Ye, James Tschanz, Dinesh Somasekhar, Siva Narendra,
Shih-Lien Lu, Ram Krishnamurthy and Vivek De Microprocessor Research, Intel
Labs, Hillsboro, OR, USA LBSF
and SFN leakage-tolerant techniques improve robustness of leakage-sensitive
and performance-critical wide dynamic circuits in the local and global
bitlines of a 256X32b register file in a 100nm dual-VT
technology. The full LBSF
design improves clock frequency by 50% or reduces energy by 37%, compared to
the best dual-VT (DVT)
design. Performance advantages of LBSF and SFN become more significant as
leakage increases.
Four-Way
Processor 800 MT/s Front Side Bus with Ground Referenced Voltage Source I/O Thomas
P. Thomas, Ian A. Young Intel Corporation Portland Technology
Development RA1-309, 5200 NE Elam Young Parkway Hillsboro OR 97124, USA A
40cm multi-drop bus shared by 5 test chips to emulate 4 processors and a
chipset runs error free at 800MT/s with 130mV margin using Ground Referenced
Voltage Source (GRVS) I/O scheme. For comparison, when the same test chip is
programmed to use Gunning Transceiver Logic (GTL), the bus speed is 500 MT/s
for the same 130mV margin under identical conditions. Static
Pulsed Bus for On-Chip Interconnects Muhammad Khellah, James Tschanz, Yibin Ye, Siva Narendra and Vivek De Circuit Research, Intel Labs, Hillsboro, OR, USA A
Static Pulsed Bus (SPB) technique offers significant advantages over
conventional static bus (SB) in delay, energy, total device width and peak VCC
current for 1500mm to 4500mm
long M4 buses in a 100nm technology. These improvements are due to reduction
in effective coupling capacitance and repeater skewing enabled by monotonic
signal transition. Unlike dynamic schemes, energy savings of SPB are
maintained across all activity factors without any clock power or routing
overhead. A
Transition-Encoded Dynamic Bus Technique for High-Performance Interconnects Mark
Anders, Nivruti Rai*, Ram Krishnamurthy, Shekhar Borkar Circuit
Research, Intel Labs Intel Corporation, Hillsboro, OR 97124, USA mark.a.anders@intel.com
*Desktop Products Group Intel Corporation, Hillsboro, OR 97124, USA
nivruti.rai@intel.com A
transition-encoded dynamic bus technique enables interconnect delay reduction
while maintaining the robustness and switching energy behavior of a static
bus. Efficient circuits, designed for a drop-in replacement, enable
significant delay and peak-current reduction even for short buses, while
obtaining energy savings at aggressive delay targets. In a 180nm 32-bit
microprocessor, 79% of all global buses exhibit 10%-35% performance
improvement. An
Accurate and Efficient Analysis Method for Multi-Gb/s Chip-to-chip Signaling
Schemes Bryan
K. Casper, Matthew Haycock, Randy Mooney Circuit Research, Intel Labs bryan.k.casper@intel.com
Hillsboro, OR This
paper introduces an accurate method of modeling the performance of high-speed
chip-to-chip signaling systems. Implemented in a simulation tool, it precisely
accounts for intersymbol interference, cross-talk
and echos as well as circuit related effects such as thermal noise, power
supply noise and receiver
jitter. We correlated the simulation tool to actual measurements of a
high-speed signaling system and
then used this tool to make tradeoffs between different methods of
chip-to-chip signaling with and without
equalization. We
present a technique to enable the integration of sensitive analog circuits
with a high performance microprocessor (Pentium . 4), on a lossy substrate. We show that by exploiting the spectral content of substrate noise, and the use appropriately tuned analog amplification it is possible to limit the isolation requirements to 70dB. By using a combination of measurement and field solver results, we show that a minimal process enhancement (i.e. a deep nwell) will yield 50 dB of isolation, and the remainder can be achieved by layout and differential circuit techniques. Selective
Node Engineering for Chip-Level Soft Error Rate Improvement Tanay
Karnik, Sriram Vangal, V. Veeramachaneni, Peter Hazucha, Vasantha Erraguntla,
Shekhar Borkar Circuit Research, Intel Labs, Hillsboro, OR, U.S.A. This
paper presents a technique to selectively engineer sequential or domino nodes
in high performance circuits to improve soft error rate (SER) induced by
cosmic rays or alpha particles. In 0.18 µm process, the SER improvement is as
much as 3X at the cell-level, 1.8X at the block- level and 1.3X at the
chip-level without any penalty in performance or area, and <3% power
penalty. The node selection, hardening and SER quantification steps are fully
automated.
Design
Optimizations of a High Performance Microprocessor Using Combinations of Dual-VT
Allocation
and Transistor Sizing James
Tschanz, Yibin Ye, Liqiong Wei 1
, Venkatesh Govindarajulu,
Nitin Borkar, Steven Burns 2 ,
Tanay Karnik, Shekhar Borkar and Vivek De Microprocessor Research, 1
Mobile Architecture, 2
Strategic
CAD, Intel Labs Hillsboro, OR, USA Joint
optimizations of dual-VT
allocation and transistor
sizing for a high performance microprocessor reduce low-VT
usage by 36%-64%, compared to
a design where only dual-VT allocation
is optimized. Designs optimized for minimum power (DVT+S) and minimum area (L-SDVT)
reduce leakage power by 20%, with minimal impact on total power and die area.
An enhancement of the optimum DVT+S design allows processor frequency to be
increased efficiently during manufacturing through low-VT
device leakage push only. Design
& Validation of the Pentium®
III and Pentium® 4 Processors
Power Delivery Tawfik
Rahal-Arabi, Greg Taylor, Matthew Ma, and Clair Webb Intel Corporation / Logic
Technology Development 5200 NE ElamYoung Parkway Hillsboro, Oregon, 97124
Email: Tawfik.r.Arabi@intel.com In
this paper, we present an empirical approach for the validation of the power
supply impedance model. The model is widely used to design the power delivery
for high performance systems. For this purpose, several silicon wafers of the
Pentium ® III and Pentium ® 4 processors were built with various amount of
decoupling. The measured data showed significant discrepancies with the model
predictions and provided useful insights in investigating the model regions of
validity. Effectiveness
of Adaptive Supply Voltage and Body Bias for Reducing Impact of Parameter
Variations in Low Power and High Performance Microprocessors James
Tschanz, James Kao 1 ,
Siva Narendra, Raj Nair and Vivek De Microprocessor Research, Intel Labs,
Hillsboro, OR, USA 1 Massachusetts
Institute of Technology Testchip measurements show that adaptive VCC is useful for reducing impacts of die-to-die and WID parameter variations on frequency, active power and leakage power distributions of both low power and high performance microprocessors. Using adaptive VCC together with adaptive VBS or WID-VBS is much more effective than using any of them individually. Adaptive VCC+WID-VBS increases the number of dies accepted in the highest two frequency bins to 80% . |