e-journal
The Case for VLIW-CMP as a Building Block for Exascale
Current ultra-high-performance computers execute instructions at the rate of roughly 10 PFLOPS (10 quadrillion floating-point operations per second) and dissipate power in the range of 10 MW. The next generation will need to execute instructions at EFLOPS rates—100 as fast as today’s—but without dissipating any more power. To achieve this challenging goal, the emphasis is on power-efficient execution, and for this we propose VLIW-CMP as a general architectural approach that improves significantly on the power efficiency of existing solutions. Compared to manycore architectures using simple, single-issue cores, VLIW-CMP reduces both power and die area, improves single-thread performance, and maintains aggregate FLOPS per die. To improve further on the power advantages of VLIW, we describe a mechanism that reduces power dissipation of both data forwarding and register-file activity.
Tidak ada salinan data
Tidak tersedia versi lain