August 22, 2003

Hot Chips 2003 Coverage 

 by Hans de Vries 

 

 

     Hot Chips 2003

 

Some coverage and a few pictures from this years Hot Chips at the Memorial Auditorium of Stanford University from August 17 until August 19. There were many interesting presentations this year in the 15th anniversary of the Hot Chips Conference.  You can find the program here.  If you weren't there, don't miss it next year.

 

     POWER 5 going Simultaneous Multi Threading (SMT)

 

Ron Kalla together with Balaram Sinharoy and Joel Tender introduced IBM's next generation POWER 5 microprocessor.  The Power 5 is an improved  dual core Power 4 micro architecture extended to support two virtual processors per core. The core sizes are increased with 24%  Performance improvements of 40% are seen for multi threaded applications. 

 

Many recourses are extended or improved which also improves the single thread performance. It was estimated that the multithreading performance improvement would have been limited to 20% without these extensions. More information will be made available during this years Micro Processor Forum in October. 

 

Extended / Improved Resources:

- General Purpose Renamed Register File (80 -> 120)

- Floating Point Renamed Register File  (80 -> 120)

- Instruction Fetch Buffers

- Reservation Stations

- Address Translation Tables:

  -  SLB (Segment Look aside Buffer

  -  TLB (Translation Look aside Buffer)

  -  ERAT (Effective to Real Address Table)

- Instruction and Data Caches ( higher associativity )

 

Resources shared by the threads: 

- Global Completion Table (Retirement), 

- Branch History Table (BHT),

- Address Translation Look aside Buffer (TLB)

- .........

 

A Thread can be either Active or Dormant. A dormant thread wakes up on an external interrupt, a decrementer interrupt (timer) or a special instruction from the active thread.

 

The relative activity of the two threads can be managed by software/hardware by controlling the Instruction decode rate for the two threads. There are 8 priority levels for each thread. (See third photo) 

 

 

 

 

     Madison going 9 Megabyte

 

Harry Muljono, Stevan Rusu, Brian Cherkauser and Jason Stinson gave us a first glimpse of next years Madison 9M with its 9 MegaByte L3 cache and talked about this years new Itaniums, the Madison 6M and the 1 GHz Deerfield.

 

 

 Itanium 2 McKinley

  Itanium 2

Madison 6M

LV Itanium 2 

 Deerfield

  process

  L3 cache

  Frequency

  Voltage

  Max.Power

  TDP Power

  180 nm

  3 MB

  1.0 GHz

  1.5 V

  130 W

  100 W

  130 nm

  6 MB

  1.5 GHz

  1.3 V

  130 W

  107 W

  130 nm

  1.5 MB

  1.0 GHz

  1.1 V

  62 W

  < 55 W

 

It looks like the 130 nm Madison 9M may go for the largest chip-size possible for a lot of lithography tools: 22 mm x 22 mm giving it a whopping ~480 mm2 up from the 374 mm2 of the Madison 6M. It has filled up all possible space with the L3 cache tiles.  There are 218 tiles in total.

 

The Madison 6M has 140 tiles of 48 kByte, 128 for the 6Mbyte data, 8 for ECC and 4 for yield redundancy. The 218 tiles for Madison would thus possibly split down in 192 for the 9 Megabyte data, 12 for ECC and another 14 for yield redundancy. Notice that 1 bit ECC for each 16 bit of data would imply that ECC is done on 32x32 bit rectangles, or exactly one 128 byte cache-line.

 

Madison 9M will be socket and chipset compatible with Madison 6M. It's frequency will be raised above 1.5 GHz

 

 

 

 

 

 

 

 

 

 

 

 

 

     UltraSPARC going Chip Multi Processing (CMP) with Gemini

 

Sanjiv Kapil introduced SUN's dual processor on a chip

Gemini. This 1.2 GHz dual processor consumes a low 32 Watts maximum ( 16 Watt per processor )  It is intended for it's Network Servers were the total throughput is more important than single tread performance.

 

Each core has it's own on chip 512 kByte L2 cache. The total transistor count is 80 million. It has a 206 mm2 die size and comes in a 959 pin ceramic uPGA.

 

It is produced on Texas Instruments advanced 130 nm CMOS process using 300 mm wafers, effective gatelength is 53 nm Leff, uses 7 layers of copper interconnect and OSG as low-k dielectric.

 

    

 Regards, Hans

 

 

 

HOME