Thoughtful Data Science: Working with data by creating visually intuitive insights with Jupyter and Pixiedust

David Taieb

  • 出版商: Packt Publishing
  • 出版日期: 2018-07-30
  • 定價: $1,540
  • 售價: 8.0$1,232
  • 語言: 英文
  • 頁數: 490
  • 裝訂: Paperback
  • ISBN: 178883996X
  • ISBN-13: 9781788839969
  • 相關分類: Data Science
  • 立即出貨 (庫存 < 3)

商品描述

Approaching the practice of data science by scripting your own data pipeline and dashboards

Key Features

  • David teaches how to build a new data pipeline using Pixiedust
  • How to get the most out of Jupyter notebooks
  • Think about the data and their visualisations, before worrying about the algorithms

Book Description

Data science has become the one scientific endeavor every business has to contend with today. We also need to learn why data algorithms work, but even more importantly, we need to be able to create new insights from our data that we can actually work with. The why is addressed in many publications today, but it is not easy to create insights such that the data scientist does not look like a mountebank creating opaque notebook code before getting to the visually compelling bits of data science: the data science process itself has to be transparent, easy to understand, and it has to be straightforward to optimise.

David Taieb created Pixiedust in Python to be able to teach non-data scientists to use Jupyter notebooks, without having to slog through the considerable amount of Jupyter code required to be able to create simple and sometimes not-so-simple insights into data. It is possible to use Pixiedust by just writing a few lines in HTML and CSS, while retaining the ability to drop or remove algorithms and visualisation options, adjust the data pipeline to the requirements posed by the data or just get some very quick results. The case studies represent a carefully graded ladder of progress, ranging all the way from data mined from social media to geo-analytical data helpful in business decision making.

It is, however, possible to use both Python and Scala to add features to the Pixiedust data pipeline, and ultimately, to bring the power of the Spark big data framework to the data scientist.

What you will learn

  • How to write basic Pixiedust dashboards
  • Building your own data pipelines without writing connecting pipeline code
  • Learn how to use Jupyter notebooks without the pain
  • Create compelling data visualisations in Pixiedust
  • Write applications running on Spark, without writing Spark code

Who This Book Is For

To produce a functioning Pixiedust dashboard, only a modicum of HMTL and CSS is required. Fluency in data interpretation and visualization is also a necessary, since this book is addressed to data professionals, e.g. business and general data analysts. The later chapters also much to offer to the budding data scientist, and to developers on a path to becoming data scientists, since they get to play with Python code running in Jupyter notebooks.

商品描述(中文翻譯)

透過撰寫自己的資料管道和儀表板來進行資料科學實踐

主要特點:
- David教授如何使用Pixiedust建立新的資料管道
- 如何充分利用Jupyter筆記本
- 在擔心算法之前先思考資料及其可視化

書籍描述:
資料科學已成為每個企業必須應對的科學努力。我們需要了解為什麼資料算法有效,更重要的是,我們需要能夠從資料中創造出實際可用的新見解。今天的許多出版物都在討論為什麼,但要創造出讓資料科學家不像是在創建不透明筆記本代碼之前就能看到引人入勝的資料科學部分,資料科學過程本身必須是透明的、易於理解的,並且必須容易優化。

David Taieb在Python中創建了Pixiedust,以便能夠教授非資料科學家使用Jupyter筆記本,而無需經過大量的Jupyter代碼來創建對資料的簡單甚至複雜的見解。只需編寫幾行HTML和CSS即可使用Pixiedust,同時保留了放棄或刪除算法和可視化選項、根據資料要求調整資料管道或獲得一些非常快速的結果的能力。案例研究代表了一個精心分級的進展階梯,從從社交媒體中挖掘的資料到對業務決策有幫助的地理分析資料。

然而,也可以使用Python和Scala來為Pixiedust資料管道添加功能,最終將Spark大數據框架的強大功能帶給資料科學家。

你將學到什麼:
- 如何編寫基本的Pixiedust儀表板
- 構建自己的資料管道,無需編寫連接管道代碼
- 學習如何在不痛苦的情況下使用Jupyter筆記本
- 在Pixiedust中創建引人入勝的資料可視化
- 編寫在Spark上運行的應用程序,而無需編寫Spark代碼

適合閱讀對象:
要製作一個運作良好的Pixiedust儀表板,只需要基本的HTML和CSS知識。對於資料解釋和可視化的流利程度也是必要的,因為本書針對的是資料專業人士,例如業務和一般資料分析師。後面的章節也對於初出茅廬的資料科學家和正在成為資料科學家的開發人員有很多提供,因為他們可以在Jupyter筆記本中運行Python代碼。