Performance Analysis and Tuning for General Purpose Graphics Processing Units (Synthesis Lectures on Computer Architecture)

Hyesoon Kim, Richard Vuduc, Sara Baghsorkhi, Jee Choi, Wen-mei Hwu

  • 出版商: Morgan & Claypool
  • 出版日期: 2012-11-01
  • 售價: $1,400
  • 貴賓價: 9.5$1,330
  • 語言: 英文
  • 頁數: 96
  • 裝訂: Paperback
  • ISBN: 1608459543
  • ISBN-13: 9781608459544
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

相關主題

商品描述

General-purpose graphics processing units (GPGPU) have emerged as an important class of shared memory parallel processing architectures, with widespread deployment in every computer class from high-end supercomputers to embedded mobile platforms. Relative to more traditional multicore systems of today, GPGPUs have distinctly higher degrees of hardware multithreading (hundreds of hardware thread contexts vs. tens), a return to wide vector units (several tens vs. 1-10), memory architectures that deliver higher peak memory bandwidth (hundreds of gigabytes per second vs. tens), and smaller caches/scratchpad memories (less than 1 megabyte vs. 1-10 megabytes). In this book, we provide a high-level overview of current GPGPU architectures and programming models. We review the principles that are used in previous shared memory parallel platforms, focusing on recent results in both the theory and practice of parallel algorithms, and suggest a connection to GPGPU platforms. We aim to provide hints to architects about understanding algorithm aspect to GPGPU. We also provide detailed performance analysis and guide optimizations from high-level algorithms to low-level instruction level optimizations. As a case study, we use n-body particle simulations known as the fast multipole method (FMM) as an example. We also briefly survey the state-of-the-art in GPU performance analysis tools and techniques. Table of Contents: GPU Design, Programming, and Trends / Performance Principles / From Principles to Practice: Analysis and Tuning / Using Detailed Performance Analysis to Guide Optimization

商品描述(中文翻譯)

通用圖形處理單元(GPGPU)已成為一類重要的共享記憶體並行處理架構,在從高端超級計算機到嵌入式移動平台的各種計算機類型中得到廣泛應用。相對於當今更傳統的多核系統,GPGPU具有明顯更高的硬件多線程度(數百個硬件線程上下文對比數十個)、回歸寬向量單元(數十個對比1-10個)、提供更高峰值記憶體帶寬的記憶體架構(每秒數百GB對比數十GB)以及較小的緩存/暫存記憶體(小於1MB對比1-10MB)。在本書中,我們提供了當前GPGPU架構和編程模型的高層次概述。我們回顧了先前共享記憶體並行平台中使用的原則,重點關注並行算法理論和實踐的最新成果,並提出與GPGPU平台的聯繫。我們旨在為架構師提供關於理解GPGPU算法方面的提示。我們還提供了詳細的性能分析和從高層次算法到低層次指令級優化的指導。作為案例研究,我們使用稱為快速多極方法(FMM)的n體粒子模擬作為示例。我們還簡要介紹了GPU性能分析工具和技術的最新發展。目錄:GPU設計、編程和趨勢/性能原則/從原則到實踐:分析和調優/使用詳細性能分析指導優化。