Data Engineering with Python (Paperback)

Crickard, Paul

  • 出版商: Packt Publishing
  • 出版日期: 2020-10-23
  • 售價: $1,770
  • 貴賓價: 9.5$1,682
  • 語言: 英文
  • 頁數: 356
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 183921418X
  • ISBN-13: 9781839214189
  • 相關分類: Python程式語言
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

商品描述

Build, monitor, and manage real-time data pipelines to create data engineering infrastructure efficiently using open-source Apache projects

 

Key features:

  • Become well-versed in data architectures, data preparation, and data optimization skills with the help of practical examples
  • Design data models and learn how to extract, transform, and load (ETL) data using Python
  • Schedule, automate, and monitor complex data pipelines in production

 

Book Description

Data engineering provides the foundation for data science and analytics, and forms an important part of all businesses. This book will help you to explore various tools and methods that are used for understanding the data engineering process using Python.

 

The book will show you how to tackle challenges commonly faced in different aspects of data engineering. You'll start with an introduction to the basics of data engineering, along with the technologies and frameworks required to build data pipelines to work with large datasets. You'll learn how to transform and clean data and perform analytics to get the most out of your data. As you advance, you'll discover how to work with big data of varying complexity and production databases, and build data pipelines. Using real-world examples, you'll build architectures on which you'll learn how to deploy data pipelines.

 

By the end of this Python book, you'll have gained a clear understanding of data modeling techniques, and will be able to confidently build data engineering pipelines for tracking data, running quality checks, and making necessary changes in production.

 

What you will learn

  • Understand how data engineering supports data science workflows
  • Discover how to extract data from files and databases and then clean, transform, and enrich it
  • Configure processors for handling different file formats as well as both relational and NoSQL databases
  • Find out how to implement a data pipeline and dashboard to visualize results
  • Use staging and validation to check data before landing in the warehouse
  • Build real-time pipelines with staging areas that perform validation and handle failures
  • Get to grips with deploying pipelines in the production environment

 

Who this book is for

This book is for data analysts, ETL developers, and anyone looking to get started with or transition to the field of data engineering or refresh their knowledge of data engineering using Python. This book will also be useful for students planning to build a career in data engineering or IT professionals preparing for a transition. No previous knowledge of data engineering is required.

商品描述(中文翻譯)

建立、監控和管理即時數據管道,以使用開源Apache項目高效地建立數據工程基礎設施。

主要特點:
- 通過實際示例,熟悉數據架構、數據準備和數據優化技能。
- 設計數據模型,並學習如何使用Python提取、轉換和加載(ETL)數據。
- 在生產環境中安排、自動化和監控複雜的數據管道。

書籍描述:
數據工程為數據科學和分析提供了基礎,也是所有企業的重要組成部分。本書將幫助您探索使用Python進行數據工程過程的各種工具和方法。

本書將向您展示如何應對數據工程不同方面常見的挑戰。您將從數據工程的基礎知識入手,了解構建用於處理大型數據集的數據管道所需的技術和框架。您將學習如何轉換和清理數據,並進行分析以充分利用數據。隨著進一步的學習,您將發現如何處理不同複雜度的大數據和生產數據庫,並構建數據管道。通過真實世界的示例,您將構建架構,並學習如何部署數據管道。

通過閱讀本書,您將清楚了解數據建模技術,並能夠自信地構建用於跟踪數據、運行質量檢查並在生產環境中進行必要更改的數據工程管道。

學到的內容:
- 了解數據工程如何支持數據科學工作流程。
- 發現如何從文件和數據庫中提取數據,然後進行清理、轉換和豐富。
- 配置處理器以處理不同的文件格式以及關聯和NoSQL數據庫。
- 瞭解如何實施數據管道和儀表板以可視化結果。
- 在數據進入數據倉庫之前使用分段和驗證檢查數據。
- 構建具有分段區域的實時管道,執行驗證並處理故障。
- 掌握在生產環境中部署管道的技巧。

本書適合對象:
本書適合數據分析師、ETL開發人員以及希望開始或轉向數據工程領域或更新數據工程知識的任何人。本書也對計劃從事數據工程職業或準備轉行的IT專業人士有用。不需要先前的數據工程知識。