Docker for Data Science: Building Scalable and Extensible Data Infrastructure Around the Jupyter Notebook Server

Joshua Cook

  • 出版商: Apress
  • 出版日期: 2017-08-25
  • 售價: $2,540
  • 貴賓價: 9.5$2,413
  • 語言: 英文
  • 頁數: 257
  • 裝訂: Paperback
  • ISBN: 1484230116
  • ISBN-13: 9781484230114
  • 相關分類: DockerJVM 語言Data Science
  • 海外代購書籍(需單獨結帳)

商品描述

Learn Docker "infrastructure as code" technology to define a system for performing standard but non-trivial data tasks on medium- to large-scale data sets, using Jupyter as the master controller.

It is not uncommon for a real-world data set to fail to be easily managed. The set may not fit well into access memory or may require prohibitively long processing. These are significant challenges to skilled software engineers and they can render the standard Jupyter system unusable. 

As a solution to this problem, Docker for Data Science proposes using Docker. You will learn how to use existing pre-compiled public images created by the major open-source technologies―Python, Jupyter, Postgres―as well as using the Dockerfile to extend these images to suit your specific purposes. The Docker-Compose technology is examined and you will learn how it can be used to build a linked system with Python churning data behind the scenes and Jupyter managing these background tasks. Best practices in using existing images are explored as well as developing your own images to deploy state-of-the-art machine learning and optimization algorithms.

What  You'll Learn 
  • Master interactive development using the Jupyter platform
  • Run and build Docker containers from scratch and from publicly available open-source images
  • Write infrastructure as code using the docker-compose tool and its docker-compose.yml file type
  • Deploy a multi-service data science application across a cloud-based system

Who This Book Is For

Data scientists, machine learning engineers, artificial intelligence researchers, Kagglers, and software developers

商品描述(中文翻譯)

學習Docker的「基礎架構即程式碼」技術,以定義一個系統來執行中大型數據集上的標準但非平凡的數據任務,並使用Jupyter作為主控制器。

現實世界的數據集往往難以輕易管理。該數據集可能不適合存入記憶體,或者需要耗時極長的處理。這些對熟練的軟體工程師來說是重大挑戰,並且可能使標準的Jupyter系統無法使用。

作為解決這個問題的方案,《Docker for Data Science》提出了使用Docker。您將學習如何使用由主要開源技術(如Python、Jupyter、Postgres)創建的現有預編譯公共映像,以及如何使用Dockerfile擴展這些映像以適應您的特定需求。本書還介紹了Docker-Compose技術,您將學習如何使用它來構建一個連接的系統,其中Python在後台處理數據,而Jupyter則管理這些後台任務。本書還探討了使用現有映像的最佳實踐,以及開發自己的映像以部署最先進的機器學習和優化算法。

您將學到以下內容:
- 掌握使用Jupyter平台進行互動式開發
- 從頭開始運行和構建Docker容器,以及使用公開可用的開源映像
- 使用docker-compose工具和其docker-compose.yml文件類型編寫基礎架構即程式碼
- 在基於雲的系統上部署多服務數據科學應用程式

本書適合以下讀者:
- 數據科學家、機器學習工程師、人工智慧研究人員、Kagglers和軟體開發人員