Deep Learning - Hardware Design

Albert Chun Chen Liu, Oscar Ming Kin Law

  • 出版日期: 2020-03-26
  • 售價: $680
  • 語言: 英文
  • 頁數: 107
  • 裝訂: 平裝
  • ISBN: 9869890202
  • ISBN-13: 9789869890205
  • 相關分類: 深度學習 DeepLearning
  • 銷售排行: 🥇 2020/6 英文書 銷售排行 第 1 名
    🥇 2020/5 英文書 銷售排行 第 1 名

立即出貨

買這商品的人也買了...

相關主題

商品描述

Preface

In 2012, the convolutional neural network (CNN) technology arrived at major breakthroughs. Since then, deep learning has become widely integrated into daily life via automotive, retail, healthcare and finance products. In 2016, the triumph of Alpha Go, as enabled by reinforcement learning (RL), further proved that the AI revolution is set to transform society––much as did the personal computer (in 1977), internet (in 1994), and the smartphone (in 2007.) Nonetheless, the revolution’s innovative efforts have thus far been focused on software development. Major hardware challenges, such as the following, remain little addressed:

•    Big input data
•    Deep neural network
•    Massive parallel processing
•    Reconfigurable network
•    Memory bottleneck
•    Intensive computation
•    Network pruning
•    Data sparsity

This book reviews various hardware designs, including the CPU, GPU and NPU. It also surveys special features aimed at resolving the above challenges. New hardware may be derived from the following designs for performance and power improvement:

•    Parallel architecture
•    Convolution optimization
•    In-memory computation
•    Near-memory architecture
•    Network optimization

The book is organized as follows:

•    Chapter 1: The neural network and its history
•    Chapter 2: The convolutional neural network model, it’s layer functions, and examples
•    Chapter 3: Parallel architectures––the Intel CPU, Nvidia GPU, Google TPU and Microsoft NPU)
•    Chapter 4: Optimizing convolution––the UCLA DCNN accelerator and MIT Eyeriss DNN
•    Chapter 5: The GT Neurocube architecture and Stanford Tetris DNN process with in-memory computation using Hybrid Memory Cube (HMC)
•    Chapter 6: Near-memory architecture––the ICT DaDianNao supercomputer and UofT Cnvlutin DNN accelerator
•    Chapter 7: Energy-efficient inference engines for network pruning


Future revisions will incorporate new approaches for enhancing deep learning hardware designs alongside other topics, including:

•    Distributive graph theory
•    High speed arithmetic
•    3D neural processing

目錄大綱

1 Introduction .......................................................................................................................................... 5
1.1 History ........................................................................................................................................... 5
1.2 Neural Network ............................................................................................................................. 6
2 Deep Learning ....................................................................................................................................... 7
2.1 Network Model ............................................................................................................................. 7
2.1.1 Convolutional Layer .............................................................................................................. 7
2.1.2 Activation Layer .................................................................................................................... 7
2.1.3 Pooling .................................................................................................................................. 7
2.1.4 Normalization ........................................................................................................................ 7
2.2 Deep Learning Challenges ............................................................................................................. 8
3 Parallel Architecture ............................................................................................................................. 9
3.1 Intel Central Processing Unit (CPU)............................................................................................... 9
3.1.1 Skylake Mesh Architecture ................................................................................................. 10
3.1.2 Intel Ultra Path Interconnect (UPI) ..................................................................................... 12
3.1.3 Sub-NUMA Clustering (SNC) ............................................................................................... 13
3.1.4 Cache Hierarchy Changes .................................................................................................... 14
3.1.5 Advanced Vector Software Extension ................................................................................. 15
3.1.6 Math Kernel Library for Deep Neural Network (MKL-DNN) ............................................... 15
3.2 Nvidia Graphics Processing Unit (GPU) ....................................................................................... 16
3.2.1 Tensor Core Architecture .................................................................................................... 18
3.2.2 Simultaneous Multi-Threading (SMT) ................................................................................. 21
3.2.3 High Bandwidth Memory (HBM2)....................................................................................... 21
3.2.4 NVLink2 Configuration ........................................................................................................ 22
3.3 Google Tensor Processing Unit (TPU) ......................................................................................... 24
3.3.1 System Architecture ............................................................................................................ 25
3.3.2 Multiply-Accumulate (MAC) Systolic Array ......................................................................... 27
3.3.3 New Brain Floating Point Format ........................................................................................ 28
3.3.4 Cloud TPU Configuration ..................................................................................................... 29
3.3.5 Cloud Software Architecture ............................................................................................... 31
3.4 Microsoft Catapult Fabric NPU Processor ................................................................................... 32
3.4.1 System Configuration .......................................................................................................... 32
3.4.2 Neural Processor Architecture ............................................................................................ 32
3.4.3 Matrix-Vector Multiplier ..................................................................................................... 33
3.4.4 Sparse Matrix-Vector Multiplication ................................................................................... 33
4 Convolution Optimization ................................................................................................................... 35
4.1 UCLA DCNN Accelerator .............................................................................................................. 35
4.1.1 System Architecture ............................................................................................................ 35
4.1.2 Filter Decomposition ........................................................................................................... 35
4.1.3 Streaming Architecture ....................................................................................................... 35
4.1.4 Convolution Unit (CU) Engine ............................................................................................. 36
4.1.5 Accumulation (ACCU) Buffer ............................................................................................... 36
4.1.6 Max Pooling......................................................................................................................... 36
4.2 MIT Eyeriss DNN Accelerator ...................................................................................................... 36
4.2.1 Convolution Mapping .......................................................................................................... 37
4.2.2 Row Stationary (RS) Dataflow ............................................................................................. 37
4.2.3 Run-Length Compression (RLC) ........................................................................................... 38
4.2.4 Network-on-Chip (NoC) ...................................................................................................... 38
4.2.5 Row Stationary Plus (RS+) Dataflow ................................................................................... 39
5 In-Memory Hierarchy .......................................................................................................................... 40
5.1 GT Neurocube Architecture ........................................................................................................ 40
5.1.1 Hybrid Memory Cube (HNC) ............................................................................................... 40
5.1.2 Memory Centric Neural Computing (MCNC) ...................................................................... 42
5.1.3 Programmable Neurosequence Generator (PNG) .............................................................. 43
5.2 Stanford Tetris DNN Processor ................................................................................................... 44
5.2.1 Memory Hierarchy .............................................................................................................. 45
5.2.2 In-Memory Accumulation ................................................................................................... 46
5.2.3 Data Scheduling .................................................................................................................. 46
5.2.4 NN Partitioning across Vaults ............................................................................................. 47
6 Near-Memory Architecture ................................................................................................................ 49
6.1 ICT DaDianNao Supercomputer .................................................................................................. 49
6.1.1 Memory Configuration ........................................................................................................ 49
6.1.2 Neural Functional Unit (NFU) .............................................................................................. 49
6.2 UofT Cnvlutin DNN Accelerator .................................................................................................. 49
6.2.1 System Architecture ............................................................................................................ 49
6.2.2 Zero-Free Neuron Array Format (ZFNAf) ............................................................................ 50
6.2.3 Network Pruning ................................................................................................................. 50
6.2.4 Raw or Encoded Format (RoE) ............................................................................................ 51
6.2.5 Vector Ineffectual Activation Identifier Format (VIAI) ........................................................ 51
6.2.6 Zero Memory Overhead Ineffectual Activation Skipping ................................................... 51
7 Network Pruning ................................................................................................................................. 52
7.1 Energy Efficient Inference Engine (EIE) ....................................................................................... 52
7.1.1 Compressed DNN Model ..................................................................................................... 52
7.1.2 Central Control Unit (CCU) .................................................................................................. 52