Azure Data Factory Cookbook - Second Edition: A data engineer's guide to building and managing ETL and ELT pipelines with data integration

Foshin, Dmitry, Chernyshova, Tonya, Anoshin, Dmitry

  • 出版商: Packt Publishing
  • 出版日期: 2024-02-28
  • 售價: $2,060
  • 貴賓價: 9.5$1,957
  • 語言: 英文
  • 頁數: 532
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1803246596
  • ISBN-13: 9781803246598
  • 相關分類: Microsoft Azure
  • 下單後立即進貨 (約3~4週)

商品描述

Solve real-world data problems and create data-driven workflows for easy data movement and processing at scale with Azure Data Factory


Key Features:

  • Learn how to load and transform data from various sources, both on-premises and on cloud
  • Use Azure Data Factory's visual environment to build and manage hybrid ETL pipelines
  • Discover how to prepare, transform, process, and enrich data to generate key insights


Book Description:

This new edition of the Azure Data Factory Cookbook, fully updated to reflect ADS V2, will help you get up and running by showing you how to create and execute your first job in ADF.


You'll learn how to branch and chain activities, create custom activities, and schedule pipelines, as well as discovering the benefits of cloud data warehousing, Azure Synapse Analytics, and Azure Data Lake Gen2 Storage.


With practical recipes, you'll learn how to actively engage with analytical tools from Azure Data Services and leverage your on-premises infrastructure with cloud-native tools to get relevant business insights. As you advance, you'll be able to integrate the most commonly used Azure Services into ADF and understand how Azure services can be useful in designing ETL pipelines. You'll familiarize yourself with the common errors that you may encounter while working with ADF and find out how to use the Azure portal to monitor pipelines. You'll also understand error messages and resolve problems in connectors and data flows with the debugging capabilities of ADF.


Two new chapters covering Azure Data Explorer and key best practices have been added, along with new recipes throughout.


By the end of this book, you'll be able to use ADF as the main ETL and orchestration tool for your data warehouse or data platform projects.


What You Will Learn:

  • Create an orchestration and transformation job in ADF
  • Develop, execute, and monitor data flows using Azure Synapse
  • Create big data pipelines using Databricks and Delta tables
  • Work with big data in Azure Data Lake using Spark Pool
  • Migrate on-premises SSIS jobs to ADF
  • Integrate ADF with commonly used Azure services such as Azure ML, Azure Logic Apps, and Azure Functions
  • Run big data compute jobs within HDInsight and Azure Databricks
  • Copy data from AWS S3 and Google Cloud Storage to Azure Storage using ADF's built-in connectors


Who this book is for:

This book is for ETL developers, data warehouse and ETL architects, software professionals, and anyone else who wants to learn about the common and not-so-common challenges faced while developing traditional and hybrid ETL solutions using Microsoft's Azure Data Factory. You'll also find this book useful if you are looking for recipes to improve or enhance your existing ETL pipelines. Basic knowledge of data warehousing is a prerequisite.

商品描述(中文翻譯)

解決現實世界的數據問題,並使用Azure Data Factory創建數據驅動的工作流程,以便在大規模上進行數據移動和處理。

主要特點:
- 學習如何從各種源(包括本地和雲端)加載和轉換數據
- 使用Azure Data Factory的可視化環境構建和管理混合ETL管道
- 發現如何準備、轉換、處理和豐富數據以生成關鍵洞察

書籍描述:
這本全面更新以反映ADS V2的Azure Data Factory Cookbook新版將幫助您通過展示如何在ADF中創建和執行第一個作業來上手。

您將學習如何分支和鏈接活動,創建自定義活動,安排管道,以及發現雲數據倉儲、Azure Synapse Analytics和Azure Data Lake Gen2 Storage的好處。

通過實用的食譜,您將學習如何積極參與Azure Data Services的分析工具,並利用本地基礎設施與雲原生工具來獲取相關的業務洞察。隨著您的進一步學習,您將能夠將最常用的Azure服務集成到ADF中,並了解Azure服務在設計ETL管道中的用途。您將熟悉在使用ADF時可能遇到的常見錯誤,並了解如何使用Azure門戶監視管道。您還將了解連接器和數據流中的錯誤消息,並使用ADF的調試功能解決問題。

新增了兩個涵蓋Azure Data Explorer和關鍵最佳實踐的章節,以及全書中的新食譜。

通過閱讀本書,您將能夠將ADF作為數據倉庫或數據平台項目的主要ETL和協調工具。

學到的內容:
- 在ADF中創建協調和轉換作業
- 使用Azure Synapse開發、執行和監控數據流
- 使用Databricks和Delta表創建大數據管道
- 使用Spark Pool在Azure Data Lake中處理大數據
- 將本地SSIS作業遷移到ADF
- 將ADF與常用的Azure服務(如Azure ML、Azure Logic Apps和Azure Functions)集成
- 在HDInsight和Azure Databricks中運行大數據計算作業
- 使用ADF的內置連接器將數據從AWS S3和Google Cloud Storage複製到Azure Storage

本書適合ETL開發人員、數據倉庫和ETL架構師、軟件專業人員以及任何想要了解使用Microsoft的Azure Data Factory開發傳統和混合ETL解決方案時面臨的常見和不太常見挑戰的人。如果您正在尋找改進或增強現有ETL管道的食譜,本書也將對您有所幫助。需要具備數據倉儲的基本知識。