Mastering Parallel Programming with R (Paperback)

Simon R. Chapple, Eilidh Troup, Thorsten Forster, Terence Sloan

買這商品的人也買了...

商品描述

Master the robust features of R parallel programming to accelerate your data science computations

About This Book

  • Create R programs that exploit the computational capability of your cloud platforms and computers to the fullest
  • Become an expert in writing the most efficient and highest performance parallel algorithms in R
  • Get to grips with the concept of parallelism to accelerate your existing R programs

Who This Book Is For

This book is for R programmers who want to step beyond its inherent single-threaded and restricted memory limitations and learn how to implement highly accelerated and scalable algorithms that are a necessity for the performant processing of Big Data. No previous knowledge of parallelism is required. This book also provides for the more advanced technical programmer seeking to go beyond high level parallel frameworks.

What You Will Learn

  • Create and structure efficient load-balanced parallel computation in R, using R's built-in parallel package
  • Deploy and utilize cloud-based parallel infrastructure from R, including launching a distributed computation on Hadoop running on Amazon Web Services (AWS)
  • Get accustomed to parallel efficiency, and apply simple techniques to benchmark, measure speed and target improvement in your own code
  • Develop complex parallel processing algorithms with the standard Message Passing Interface (MPI) using RMPI, pbdMPI, and SPRINT packages
  • Build and extend a parallel R package (SPRINT) with your own MPI-based routines
  • Implement accelerated numerical functions in R utilizing the vector processing capability of your Graphics Processing Unit (GPU) with OpenCL
  • Understand parallel programming pitfalls, such as deadlock and numerical instability, and the approaches to handle and avoid them
  • Build a task farm master-worker, spatial grid, and hybrid parallel R programs

In Detail

R is one of the most popular programming languages used in data science. Applying R to big data and complex analytic tasks requires the harnessing of scalable compute resources.

Mastering Parallel Programming with R presents a comprehensive and practical treatise on how to build highly scalable and efficient algorithms in R. It will teach you a variety of parallelization techniques, from simple use of R's built-in parallel package versions of lapply(), to high-level AWS cloud-based Hadoop and Apache Spark frameworks. It will also teach you low level scalable parallel programming using RMPI and pbdMPI for message passing, applicable to clusters and supercomputers, and how to exploit thousand-fold simple processor GPUs through ROpenCL. By the end of the book, you will understand the factors that influence parallel efficiency, including assessing code performance and implementing load balancing; pitfalls to avoid, including deadlock and numerical instability issues; how to structure your code and data for the most appropriate type of parallelism for your problem domain; and how to extract the maximum performance from your R code running on a variety of computer systems.

商品描述(中文翻譯)

掌握 R 平行程式設計的強大功能,以加速您的資料科學計算

關於本書

- 創建 R 程式,充分利用雲平台和電腦的計算能力
- 成為撰寫效能最高且最有效率的平行演算法的專家
- 瞭解並應用平行處理的概念,加速現有的 R 程式

適合閱讀對象

本書適合 R 程式設計師,希望超越其固有的單執行緒和受限記憶體限制,並學習如何實現高度加速和可擴展的演算法,以處理大數據的高效處理。不需要先備的平行處理知識。本書也適合進階的技術程式設計師,希望超越高階平行框架。

您將學到什麼

- 使用 R 內建的平行套件,創建和結構化高效的負載平衡平行計算
- 部署和利用基於雲的平行基礎架構,包括在亞馬遜網路服務 (AWS) 上運行的 Hadoop 分散式計算
- 熟悉平行效能,並應用簡單的技巧來測試、測量速度並改進您自己的程式碼
- 使用標準的訊息傳遞介面 (MPI) 使用 RMPI、pbdMPI 和 SPRINT 套件開發複雜的平行處理演算法
- 使用自己的 MPI-based 程式碼建立和擴展平行 R 套件 (SPRINT)
- 利用 OpenCL 利用您的圖形處理器 (GPU) 的向量處理能力,在 R 中實現加速的數值函數
- 瞭解平行程式設計的陷阱,如死結和數值不穩定性,以及處理和避免這些問題的方法
- 建立任務農場主從、空間網格和混合平行 R 程式

詳細內容

R 是資料科學中最受歡迎的程式語言之一。將 R 應用於大數據和複雜的分析任務需要利用可擴展的計算資源。

《R 平行程式設計大師》提供了一個全面且實用的論述,教導您如何在 R 中構建高度可擴展和高效的演算法。本書將教導您各種平行化技術,從簡單使用 R 內建的平行套件版本的 lapply(),到高階的 AWS 雲端 Hadoop 和 Apache Spark 框架。本書還將教導您使用 RMPI 和 pbdMPI 進行低層次可擴展的訊息傳遞平行程式設計,適用於叢集和超級電腦,以及如何通過 ROpenCL 利用千倍簡單處理器 GPU。通過閱讀本書,您將瞭解影響平行效能的因素,包括評估程式碼性能和實現負載平衡;避免死結和數值不穩定性問題;如何為您的問題領域結構化程式碼和資料以適當的平行處理類型;以及如何在各種計算系統上最大程度地提高 R 程式碼的性能。