Principles and Practice of Big Data: Preparing, Sharing, and Analyzing Complex Information

Jules J Berman

商品描述

Principles and Practice of Big Data: Preparing, Sharing, and Analyzing Complex Information, Second Edition updates and expands on the first edition, bringing a set of techniques and algorithms that are tailored to Big Data projects. The book stresses the point that most data analyses conducted on large, complex data sets can be achieved without the use of specialized suites of software (e.g., Hadoop), and without expensive hardware (e.g., supercomputers). The core of every algorithm described in the book can be implemented in a few lines of code using just about any popular programming language (Python snippets are provided).

Through the use of new multiple examples, this edition demonstrates that if we understand our data, and if we know how to ask the right questions, we can learn a great deal from large and complex data collections. The book will assist students and professionals from all scientific backgrounds who are interested in stepping outside the traditional boundaries of their chosen academic disciplines.

  • Presents new methodologies that are widely applicable to just about any project involving large and complex datasets
  • Offers readers informative new case studies across a range scientific and engineering disciplines
  • Provides insights into semantics, identification, de-identification, vulnerabilities and regulatory/legal issues
  • Utilizes a combination of pseudocode and very short snippets of Python code to show readers how they may develop their own projects without downloading or learning new software

商品描述(中文翻譯)

《大數據的原理與實踐:準備、分享和分析複雜信息,第二版》更新並擴充了第一版,提供了一套針對大數據項目量身定制的技術和算法。本書強調,對於大型複雜數據集進行的大多數數據分析可以在不使用專門的軟件套件(例如Hadoop)和昂貴的硬件(例如超級計算機)的情況下完成。本書描述的每個算法的核心都可以使用幾行代碼在幾乎任何流行的編程語言中實現(提供了Python片段)。

通過使用新的多個示例,本版演示了如果我們了解我們的數據,並且知道如何提出正確的問題,我們可以從大型和複雜的數據集中學到很多東西。本書將幫助所有科學背景的學生和專業人士,他們有興趣跳出自己所選學術學科的傳統界限。

本書提供了廣泛適用於幾乎任何涉及大型和複雜數據集的項目的新方法。它還提供了一系列科學和工程學學科的有價值的新案例研究。本書還提供了有關語義學、識別、去識別化、漏洞和監管/法律問題的見解。它結合了偽代碼和非常簡短的Python代碼片段,向讀者展示如何在不下載或學習新軟件的情況下開發自己的項目。