Modern Data Architectures with Python: A practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python

Lipp, Brian

  • 出版商: Packt Publishing
  • 出版日期: 2023-09-29
  • 售價: $1,900
  • 貴賓價: 9.5$1,805
  • 語言: 英文
  • 頁數: 318
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1801070490
  • ISBN-13: 9781801070492
  • 相關分類: Python程式語言
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

商品描述

Build scalable and reliable data ecosystems using Data Mesh, Databricks Spark, and Kafka

Key Features

  • Develop modern data skills used in emerging technologies
  • Learn pragmatic design methodologies such as Data Mesh and data lakehouses
  • Gain a deeper understanding of data governance
  • Purchase of the print or Kindle book includes a free PDF eBook

Book Description

Modern Data Architectures with Python will teach you how to seamlessly incorporate your machine learning and data science work streams into your open data platforms. You’ll learn how to take your data and create open lakehouses that work with any technology using tried-and-true techniques, including the medallion architecture and Delta Lake.

Starting with the fundamentals, this book will help you build pipelines on Databricks, an open data platform, using SQL and Python. You’ll gain an understanding of notebooks and applications written in Python using standard software engineering tools such as git, pre-commit, Jenkins, and Github. Next, you’ll delve into streaming and batch-based data processing using Apache Spark and Confluent Kafka. As you advance, you’ll learn how to deploy your resources using infrastructure as code and how to automate your workflows and code development. Since any data platform's ability to handle and work with AI and ML is a vital component, you’ll also explore the basics of ML and how to work with modern MLOps tooling. Finally, you’ll get hands-on experience with Apache Spark, one of the key data technologies in today’s market.

By the end of this book, you’ll have amassed a wealth of practical and theoretical knowledge to build, manage, orchestrate, and architect your data ecosystems.

What you will learn

  • Understand data patterns including delta architecture
  • Discover how to increase performance with Spark internals
  • Find out how to design critical data diagrams
  • Explore MLOps with tools such as AutoML and MLflow
  • Get to grips with building data products in a data mesh
  • Discover data governance and build confidence in your data
  • Introduce data visualizations and dashboards into your data practice

Who this book is for

This book is for developers, analytics engineers, and managers looking to further develop a data ecosystem within their organization. While they’re not prerequisites, basic knowledge of Python and prior experience with data will help you to read and follow along with the examples.

商品描述(中文翻譯)

建立可擴展且可靠的數據生態系統,使用Data Mesh、Databricks Spark和Kafka

主要特點:

- 開發在新興技術中使用的現代數據技能
- 學習實用的設計方法,如Data Mesh和數據湖倉庫
- 深入了解數據治理
- 購買印刷版或Kindle書籍將包含免費的PDF電子書

書籍描述:

《使用Python的現代數據架構》將教你如何無縫地將機器學習和數據科學工作流整合到你的開放數據平台中。你將學習如何使用經過驗證的技術,包括勳章架構和Delta Lake,將數據創建成與任何技術兼容的開放湖倉庫。

從基礎知識開始,本書將幫助你使用SQL和Python在Databricks上構建流程。你將了解使用Python編寫的筆記本和應用程序,並使用git、pre-commit、Jenkins和Github等標準軟件工程工具。接下來,你將深入研究使用Apache Spark和Confluent Kafka進行流式和批量數據處理。隨著進一步的學習,你將學習如何使用基礎設施即代碼部署資源,以及如何自動化工作流程和代碼開發。由於任何數據平台處理和使用AI和ML的能力都是至關重要的組成部分,你還將探索ML的基礎知識以及如何使用現代MLOps工具。最後,你將親身體驗Apache Spark,這是當今市場上的關鍵數據技術之一。

通過閱讀本書,你將累積豐富的實踐和理論知識,以構建、管理、協調和設計你的數據生態系統。

你將學到什麼:

- 理解包括Delta架構在內的數據模式
- 發現如何通過Spark內部機制提高性能
- 了解如何設計關鍵數據圖表
- 使用AutoML和MLflow等工具探索MLOps
- 掌握在數據網格中構建數據產品
- 了解數據治理,對數據建立信心
- 將數據可視化和儀表板引入你的數據實踐

本書適合對在組織內進一步發展數據生態系統的開發人員、分析工程師和管理人員。雖然不是必需的,但具備Python的基本知識和先前的數據經驗將有助於你閱讀並跟隨示例。

目錄大綱

  1. Modern Data Processing Architectures
  2. Basics of Data Analytics Engineering
  3. Cloud Storage and Processing Concepts
  4. Python Batch and Stream Processing with Spark
  5. Streaming Data with Kafka
  6. Python MLOps
  7. Python and SQL based Visualizations
  8. Integrating CI into your workflow
  9. Data Orchestration
  10. Data Governance
  11. Introduction to Saturn Insurance, Deploying CI and ELT
  12. Data Governance and Dashboards

目錄大綱(中文翻譯)

現代數據處理架構
數據分析工程基礎
雲端儲存和處理概念
使用Spark進行Python批次和流式處理
使用Kafka進行流式數據處理
Python MLOps
基於Python和SQL的視覺化
將CI整合到您的工作流程中
數據編排
數據治理
Saturn Insurance介紹,CI和ELT部署
數據治理和儀表板