Architecting Large Caches with Reduced Energy
With process scaling, a large cache will be required in the future in order to meet the demands of emerging multi-core systems with higher processing speeds. However, the low density of Static Random Access Memory (SRAM) hinders the growth of cache capacity, which can take up to half of the die area. At the same time, main memory, with its long latency and limited bandwidth, also does not keep up with the speed of the CPU. Thus, new approaches are needed to increase on-die cache capacity and overcome the memory wall problem. Using the emerging 3D-stacked Dynamic Random Access Memory (DRAM) cache, which can easily provide gigabytes of storage, as the last level cache, is one potential approach to address the memory wall problem. However, the DRAM cache suffers from high energy consumption with increasing capacity. This dissertation ﬁrst presents an energy-efficient DRAM cache design. This design is based on the observation that the DRAM cache with longer bitlines consumes more energy due to larger capacitance. We propose TCache, which partitions every subarray of DRAM cache banks into three sublevels and schedules energy-efficient data movement among these levels based on reuse distance. We also propose the LevelMap and WayMap to indicate in which sublevel and way that every data block of the DRAM cache is located. The Energy-efficient Data Movement policy based on the reuse distance is presented to increase the hit rate in the energy-efficient sublevel regions. Evaluations show these techniques reduce DRAM cache energy consumption by 33.4% (by 11% after considering DRAM cache controller and DRAM cache logic overall). Performance is improved by 10.6% on average over the baseline DRAM cache (by 7% after considering DRAM cache controller and DRAM cache logic overall). A novel hybrid cache architecture consisting of both a DRAM region and a Spin-Transfer-Torque-RAM (STT-RAM) region is then introduced. This design is based on the observation that there are many redundant bits written in the row buffer and futile bits written back to STT-RAM cells, which do not change the cells value but still cost high write energy. We propose the selective write back to row buffer and selective write back to cell array optimizations to reduce high write energy of the STT-RAM region by removing the unnecessary bit-writes. In this dissertation, we also propose the reuse distance-oriented data movement and a novel tag design for the hybrid cache. The results show that our hybrid cache achieves on average a 28.3% energy reduction and 6.7% performance improvement for the write optimizations (15% energy reduction and 4% performance improvement after considering hybrid cache controller and hybrid cache logic overall). Although STT-RAM with near-zero leakage can be integrated with the DRAM cache as a hybrid cache to reduce static energy, the high write energy of STT-RAM brings another energy challenge. In this dissertation, we also describe a tri-regional hybrid cache that can enjoy the advantage of both DRAM and STT-RAM technologies. We propose an asymmetric data access policy and a prediction table to further reduce the energy of the large hybrid cache. Using the tri-regional design, the results show that energy is reduced by 26% and performance is improved by 11% on average. However, the limitation is that the DRAM-style refresh cannot sufficiently remove error in the STT-RAM, which needs the error correcting method such as the ECC to completely eliminate the error.