Data Pipelines with Apache Airflow, Second Edition: Orchestration for Data and AI
暫譯: 使用 Apache Airflow 的數據管道(第二版):數據與 AI 的編排
Ruiter, Julian de, Cabral, Ismael, Geusebroek, Kris
- 出版商: Manning
- 出版日期: 2026-01-27
- 售價: $2,370
- 貴賓價: 9.8 折 $2,323
- 語言: 英文
- 頁數: 512
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1633436373
- ISBN-13: 9781633436374
-
相關分類:
分散式架構
-
其他版本:
Data Pipelines with Apache Airflow (Paperback)
海外代購書籍(需單獨結帳)
相關主題
商品描述
Simplify, streamline, and scale your data operations with data pipelines built on Apache Airflow. Apache Airflow provides a batteries-included platform for designing, implementing, and monitoring data pipelines. Building pipelines on Airflow eliminates the need for patchwork stacks and homegrown processes, adding security and consistency to the process. Now in its second edition, Data Pipelines with Apache Airflow teaches you to harness this powerful platform to simplify and automate your data pipelines, reduce operational overhead, and seamlessly integrate all the technologies in your stack. In Data Pipelines with Apache Airflow, Second Edition you'll learn how to: - Master the core concepts of Airflow architecture and workflow design
- Schedule data pipelines using the Dataset API and time tables, including complex irregular schedules
- Develop custom Airflow components for your specific needs
- Implement comprehensive testing strategies for your pipelines
- Apply industry best practices for building and maintaining Airflow workflows
- Deploy and operate Airflow in production environments
- Orchestrate workflows in container-native environments
- Build and deploy Machine Learning and Generative AI models using Airflow Data Pipelines with Apache Airflow has empowered thousands of data engineers to build more successful data platforms. This new second edition has been fully revised to cover the latest features of Apache Airflow, including the Taskflow API, deferrable operators, and Large Language Model integration. Filled with real-world scenarios and examples, you'll be carefully guided from Airflow novice to expert. About the book Data Pipelines with Apache Airflow, Second Edition teaches you how to build and maintain effective data pipelines. You'll master every aspect of directed acyclic graphs (DAGs)--the power behind Airflow--and learn to customize them for your pipeline's specific needs. Part reference and part tutorial, each technique is illustrated with engaging hands-on examples, from training machine learning models for generative AI to optimizing delivery routes. You'll explore common Airflow usage patterns, including aggregating multiple data sources and connecting to data lakes, while discovering exciting new features such as dynamic scheduling, the Taskflow API, and Kubernetes deployments. About the reader For DevOps, data engineers, machine learning engineers, and sysadmins with intermediate Python skills. About the author Julian de Ruiter is a Data + AI engineering lead at Xebia Data, with a background in computer and life sciences and a PhD in computational cancer biology. As consultant at Xebia Data, he enjoys helping clients design and build AI solutions and platforms, as well as the teams that drive them. From this work, he has extensive experience in deploying and applying Apache Airflow in production in diverse environments. Ismael Cabral is a Machine Learning Engineer and Airflow trainer with experience spanning across Europe, US, Mexico, and South America, where he has worked with market-leading companies. He has vast experience implementing data pipelines and deploying machine learning models in production. Kris Geusebroek is a data-engineering consultant with extensive hands-on experience with Apache Airflow at several clients and is the maintainer of Whirl (the open source local testing with Airflow repository), where he is actively adding new examples based on new functionality and new technologies that integrate with Airflow. Daniel van der Ende is a Data Engineer who first started using Apache Airflow back in 2016. Since then, he has worked in many different Airflow environments, both on-premises and in the cloud. He has actively contributed to the Airflow project itself, as well as related projects such as Astronomer-Cosmos. Bas Harenslak is a Staff Architect at Astronomer, where he helps customers develop mission-critical data pipelines at large scale using Apache Airflow and the Astro platform. With a background in software engineering and computer science, he enjoys working on software and data as if they are challenging puzzles. He favours working on open source software, is a committer on the Apache Airflow project, and co-author of the first edition of Data Pipelines with Apache Airflow. Get a free eBook (PDF or ePub) from Manning as well as access to the online liveBook format (and its AI assistant that will answer your questions in any language) when you purchase the print book.
- Schedule data pipelines using the Dataset API and time tables, including complex irregular schedules
- Develop custom Airflow components for your specific needs
- Implement comprehensive testing strategies for your pipelines
- Apply industry best practices for building and maintaining Airflow workflows
- Deploy and operate Airflow in production environments
- Orchestrate workflows in container-native environments
- Build and deploy Machine Learning and Generative AI models using Airflow Data Pipelines with Apache Airflow has empowered thousands of data engineers to build more successful data platforms. This new second edition has been fully revised to cover the latest features of Apache Airflow, including the Taskflow API, deferrable operators, and Large Language Model integration. Filled with real-world scenarios and examples, you'll be carefully guided from Airflow novice to expert. About the book Data Pipelines with Apache Airflow, Second Edition teaches you how to build and maintain effective data pipelines. You'll master every aspect of directed acyclic graphs (DAGs)--the power behind Airflow--and learn to customize them for your pipeline's specific needs. Part reference and part tutorial, each technique is illustrated with engaging hands-on examples, from training machine learning models for generative AI to optimizing delivery routes. You'll explore common Airflow usage patterns, including aggregating multiple data sources and connecting to data lakes, while discovering exciting new features such as dynamic scheduling, the Taskflow API, and Kubernetes deployments. About the reader For DevOps, data engineers, machine learning engineers, and sysadmins with intermediate Python skills. About the author Julian de Ruiter is a Data + AI engineering lead at Xebia Data, with a background in computer and life sciences and a PhD in computational cancer biology. As consultant at Xebia Data, he enjoys helping clients design and build AI solutions and platforms, as well as the teams that drive them. From this work, he has extensive experience in deploying and applying Apache Airflow in production in diverse environments. Ismael Cabral is a Machine Learning Engineer and Airflow trainer with experience spanning across Europe, US, Mexico, and South America, where he has worked with market-leading companies. He has vast experience implementing data pipelines and deploying machine learning models in production. Kris Geusebroek is a data-engineering consultant with extensive hands-on experience with Apache Airflow at several clients and is the maintainer of Whirl (the open source local testing with Airflow repository), where he is actively adding new examples based on new functionality and new technologies that integrate with Airflow. Daniel van der Ende is a Data Engineer who first started using Apache Airflow back in 2016. Since then, he has worked in many different Airflow environments, both on-premises and in the cloud. He has actively contributed to the Airflow project itself, as well as related projects such as Astronomer-Cosmos. Bas Harenslak is a Staff Architect at Astronomer, where he helps customers develop mission-critical data pipelines at large scale using Apache Airflow and the Astro platform. With a background in software engineering and computer science, he enjoys working on software and data as if they are challenging puzzles. He favours working on open source software, is a committer on the Apache Airflow project, and co-author of the first edition of Data Pipelines with Apache Airflow. Get a free eBook (PDF or ePub) from Manning as well as access to the online liveBook format (and its AI assistant that will answer your questions in any language) when you purchase the print book.
商品描述(中文翻譯)
簡化、精簡並擴展您的數據操作,使用基於 Apache Airflow 的數據管道。
Apache Airflow 提供了一個包含所有功能的平台,用於設計、實施和監控數據管道。在 Airflow 上構建管道消除了對拼湊堆棧和自製流程的需求,為過程增添了安全性和一致性。現在已進入第二版,Data Pipelines with Apache Airflow 教您如何利用這個強大的平台來簡化和自動化您的數據管道,減少運營開銷,並無縫整合您堆棧中的所有技術。 在 Data Pipelines with Apache Airflow, Second Edition 中,您將學習如何: - 精通 Airflow 架構和工作流程設計的核心概念- 使用 Dataset API 和時間表安排數據管道,包括複雜的不規則時間表
- 為您的特定需求開發自定義的 Airflow 組件
- 為您的管道實施全面的測試策略
- 應用行業最佳實踐來構建和維護 Airflow 工作流程
- 在生產環境中部署和運行 Airflow
- 在容器原生環境中協調工作流程
- 使用 Airflow 構建和部署機器學習和生成式 AI 模型 Data Pipelines with Apache Airflow 使數千名數據工程師能夠構建更成功的數據平台。這本全新第二版已全面修訂,以涵蓋 Apache Airflow 的最新功能,包括 Taskflow API、可延遲操作符和大型語言模型集成。書中充滿了現實世界的場景和示例,您將從 Airflow 新手逐步指導至專家。 關於本書 Data Pipelines with Apache Airflow, Second Edition 教您如何構建和維護有效的數據管道。您將精通有向無環圖(DAG)的每個方面——這是 Airflow 的核心力量——並學會根據您的管道特定需求進行自定義。這本書既是參考資料也是教程,每個技術都配有引人入勝的實作示例,從為生成式 AI 訓練機器學習模型到優化配送路線。您將探索常見的 Airflow 使用模式,包括聚合多個數據源和連接到數據湖,同時發現動態調度、Taskflow API 和 Kubernetes 部署等令人興奮的新功能。 關於讀者 適合具備中級 Python 技能的 DevOps、數據工程師、機器學習工程師和系統管理員。 關於作者 Julian de Ruiter 是 Xebia Data 的數據 + AI 工程主管,擁有計算機和生命科學背景,並擁有計算癌症生物學的博士學位。作為 Xebia Data 的顧問,他喜歡幫助客戶設計和構建 AI 解決方案和平台,以及推動這些解決方案的團隊。從這項工作中,他在多種環境中部署和應用 Apache Airflow 方面擁有豐富的經驗。 Ismael Cabral 是一名機器學習工程師和 Airflow 培訓師,擁有在歐洲、美國、墨西哥和南美的經驗,曾與市場領先的公司合作。他在實施數據管道和在生產中部署機器學習模型方面擁有豐富的經驗。 Kris Geusebroek 是一名數據工程顧問,在多個客戶中擁有豐富的 Apache Airflow 實作經驗,並且是 Whirl(開源本地測試與 Airflow 存儲庫)的維護者,他正在根據新功能和與 Airflow 集成的新技術積極添加新的示例。 Daniel van der Ende 是一名數據工程師,自 2016 年起開始使用 Apache Airflow。此後,他在許多不同的 Airflow 環境中工作,包括本地和雲端。他積極參與 Airflow 項目本身以及相關項目,如 Astronomer-Cosmos。 Bas Harenslak 是 Astronomer 的首席架構師,幫助客戶使用 Apache Airflow 和 Astro 平台開發大規模的關鍵任務數據管道。擁有軟體工程和計算機科學背景,他喜歡將軟體和數據視為挑戰性的謎題來解決。他偏好從事開源軟體的工作,是 Apache Airflow 項目的提交者,也是 Data Pipelines with Apache Airflow 第一版的共同作者。 購買印刷版書籍時,您將獲得 Manning 提供的免費電子書(PDF 或 ePub)以及在線 liveBook 格式的訪問權限(及其 AI 助手,將以任何語言回答您的問題)。
作者簡介
Julian de Ruiter is a Data + AI engineering lead at Xebia Data, with a background in computer and life sciences and a PhD in computational cancer biology. As consultant at Xebia Data, he enjoys helping clients design and build AI solutions and platforms, as well as the teams that drive them. From this work, he has extensive experience in deploying and applying Apache Airflow in production in diverse environments. Ismael Cabral is a Machine Learning Engineer and Airflow trainer with experience spanning across Europe, US, Mexico, and South America, where he has worked with market-leading companies. He has vast experience implementing data pipelines and deploying machine learning models in production. Kris Geusebroek is a data-engineering consultant with extensive hands-on experience with Apache Airflow at several clients and is the maintainer of Whirl (the open source local testing with Airflow repository), where he is actively adding new examples based on new functionality and new technologies that integrate with Airflow. Daniel van der Ende is a Data Engineer who first started using Apache Airflow back in 2016. Since then, he has worked in many different Airflow environments, both on-premises and in the cloud. He has actively contributed to the Airflow project itself, as well as related projects such as Astronomer-Cosmos. Bas Harenslak is a Staff Architect at Astronomer, where he helps customers develop mission-critical data pipelines at large scale using Apache Airflow and the Astro platform. With a background in software engineering and computer science, he enjoys working on software and data as if they are challenging puzzles. He favours working on open source software, is a committer on the Apache Airflow project, and co-author of the first edition of Data Pipelines with Apache Airflow.
作者簡介(中文翻譯)
Julian de Ruiter 是 Xebia Data 的數據與人工智慧工程主管,擁有計算機與生命科學的背景,以及計算癌症生物學的博士學位。作為 Xebia Data 的顧問,他喜歡幫助客戶設計和構建人工智慧解決方案和平台,以及推動這些解決方案的團隊。從這項工作中,他在多樣化環境中部署和應用 Apache Airflow 的經驗非常豐富。
Ismael Cabral 是一位機器學習工程師和 Airflow 培訓師,擁有遍及歐洲、美國、墨西哥和南美的經驗,曾與市場領先的公司合作。他在實施數據管道和部署機器學習模型到生產環境方面擁有豐富的經驗。Kris Geusebroek 是一位數據工程顧問,擁有在多個客戶中使用 Apache Airflow 的豐富實務經驗,並且是 Whirl(開源的本地測試 Airflow 倉庫)的維護者,他正在積極根據新功能和與 Airflow 整合的新技術添加新的範例。Daniel van der Ende 是一位數據工程師,自 2016 年首次開始使用 Apache Airflow 以來,他在許多不同的 Airflow 環境中工作,包括本地和雲端。他積極參與 Airflow 項目本身,以及相關項目如 Astronomer-Cosmos 的貢獻。Bas Harenslak 是 Astronomer 的資深架構師,幫助客戶使用 Apache Airflow 和 Astro 平台開發大規模的關鍵任務數據管道。擁有軟體工程和計算機科學的背景,他喜歡將軟體和數據視為挑戰性的謎題來解決。他偏好從事開源軟體的工作,是 Apache Airflow 項目的提交者,也是 Data Pipelines with Apache Airflow 第一版的共同作者。