Data-Centric Machine Learning with Python: The ultimate guide to engineering and deploying high-quality models based on good data

Christensen, Jonas, Bajaj, Nakul, Gosada, Manmohan

  • 出版商: Packt Publishing
  • 出版日期: 2024-02-29
  • 售價: $1,900
  • 貴賓價: 9.5$1,805
  • 語言: 英文
  • 頁數: 378
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1804618128
  • ISBN-13: 9781804618127
  • 相關分類: Python程式語言Machine Learning
  • 海外代購書籍(需單獨結帳)

商品描述

Join the data-centric revolution and master the concepts, techniques, and algorithms shaping the future of AI and ML development, using Python

 

Key Features:

  • Grasp the principles of data centricity and apply them to real-world scenarios
  • Gain experience with quality data collection, labeling, and synthetic data creation using Python
  • Develop essential skills for building reliable, responsible, and ethical machine learning solutions
  • Purchase of the print or Kindle book includes a free PDF eBook

 

Book Description:

In the rapidly advancing data-driven world where data quality is pivotal to the success of machine learning and artificial intelligence projects, this critically timed guide provides a rare, end-to-end overview of data-centric machine learning (DCML), along with hands-on applications of technical and non-technical approaches to generating deeper and more accurate datasets.

 

This book will help you understand what data-centric ML/AI is and how it can help you to realize the potential of 'small data'. Delving into the building blocks of data-centric ML/AI, you'll explore the human aspects of data labeling, tackle ambiguity in labeling, and understand the role of synthetic data. From strategies to improve data collection to techniques for refining and augmenting datasets, you'll learn everything you need to elevate your data-centric practices. Through applied examples and insights for overcoming challenges, you'll get a roadmap for implementing data-centric ML/AI in diverse applications in Python.

 

By the end of this book, you'll have developed a profound understanding of data-centric ML/AI and the proficiency to seamlessly integrate common data-centric approaches in the model development lifecycle to unlock the full potential of your machine learning projects by prioritizing data quality and reliability.

 

What You Will Learn:

  • Understand the impact of input data quality compared to model selection and tuning
  • Recognize the crucial role of subject-matter experts in effective model development
  • Implement data cleaning, labeling, and augmentation best practices
  • Explore common synthetic data generation techniques and their applications
  • Apply synthetic data generation techniques using common Python packages
  • Detect and mitigate bias in a dataset using best-practice techniques
  • Understand the importance of reliability, responsibility, and ethical considerations in ML/AI

 

Who this book is for:

This book is for data science professionals and machine learning enthusiasts looking to understand the concept of data-centricity, its benefits over a model-centric approach, and the practical application of a best-practice data-centric approach in their work. This book is also for other data professionals and senior leaders who want to explore the tools and techniques to improve data quality and create opportunities for small data ML/AI in their organizations.

商品描述(中文翻譯)

加入以數據為中心的革命,掌握塑造人工智慧和機器學習發展未來的概念、技術和算法,使用Python。

主要特點:
- 掌握數據中心性原則,並將其應用於實際情境
- 使用Python獲得質量數據收集、標記和合成數據創建的經驗
- 發展構建可靠、負責任和道德的機器學習解決方案的基本技能
- 購買印刷版或Kindle電子書,包括免費的PDF電子書

書籍描述:
在快速發展的數據驅動世界中,數據質量對於機器學習和人工智慧項目的成功至關重要。這本關鍵時刻的指南提供了一個全面的數據中心機器學習(DCML)概述,以及技術和非技術方法在生成更深入和更準確的數據集方面的實際應用。

本書將幫助您了解數據中心的機器學習/人工智慧是什麼,以及它如何幫助您實現“小數據”的潛力。從數據中心機器學習/人工智慧的基礎開始,您將探索數據標記的人類因素,解決標記的模糊性,並了解合成數據的作用。從改進數據收集策略到優化和擴充數據集的技術,您將學到提升數據中心實踐所需的一切。通過應用實例和克服挑戰的見解,您將獲得在Python中實施數據中心機器學習/人工智慧的路線圖。

通過閱讀本書,您將深入了解數據中心機器學習/人工智慧,並具備無縫集成常見數據中心方法於模型開發生命周期中的能力,以優先考慮數據質量和可靠性,發揮機器學習項目的全部潛力。

學到的內容:
- 瞭解輸入數據質量對模型選擇和調整的影響
- 認識在有效模型開發中主題專家的關鍵角色
- 實施數據清理、標記和擴充的最佳實踐
- 探索常見合成數據生成技術及其應用
- 使用常見的Python套件應用合成數據生成技術
- 使用最佳實踐技術檢測和減輕數據集中的偏見
- 瞭解在機器學習/人工智慧中可靠性、責任和道德考慮的重要性

本書適合對數據科學專業人士和機器學習愛好者,希望了解數據中心性概念、其相對於模型中心方法的優勢,以及在工作中實施最佳實踐數據中心方法的實際應用。本書也適合其他數據專業人士和高級管理人員,他們希望探索改善數據質量並為小數據機器學習/人工智慧創造機會的工具和技術。