Building Machine Learning Pipelines: Automating Model Life Cycles with Tensorflow

Hannes Hapke, Catherine Nelson



Companies are spending billions on machine learning projects, but it's money wasted if the models can't be deployed effectively. In this practical guide, Hannes Hapke and Catherine Nelson walk you through the steps of automating a machine learning pipeline using the TensorFlow ecosystem. You'll learn the techniques and tools that will cut deployment time from days to minutes, so that you can focus on developing new models rather than maintaining legacy systems.

Data scientists, machine learning engineers, and DevOps engineers will discover how to go beyond model development to successfully productize their data science projects, while managers will better understand the role they play in helping to accelerate these projects. The book also explores new approaches for integrating data privacy into machine learning pipelines.

  • Understand the machine learning management lifecycle
  • Implement data pipelines with Apache Airflow and Kubeflow Pipelines
  • Work with data using TensorFlow tools like ML Metadata, TensorFlow Data Validation, and TensorFlow Transform
  • Analyze models with TensorFlow Model Analysis and ship them with the TFX Model Pusher Component after the ModelValidator TFX Component confirmed that the analysis results are an improvement
  • Deploy models in a variety of environments with TensorFlow Serving, TensorFlow Lite, and TensorFlow.js
  • Learn methods for adding privacy, including differential privacy with TensorFlow Privacy and federated learning with TensorFlow Federated
  • Design model feedback loops to increase your data sets and learn when to update your machine learning models





公司正在花費數十億美元進行機器學習項目,但如果模型無法有效部署,這些錢就是浪費。在這本實用指南中,Hannes Hapke和Catherine Nelson將帶領您逐步自動化使用TensorFlow生態系統的機器學習流程。您將學習到的技術和工具可以將部署時間從數天縮短到數分鐘,這樣您就可以專注於開發新模型,而不是維護舊系統。


- 瞭解機器學習管理生命周期
- 使用Apache Airflow和Kubeflow Pipelines實現數據流程
- 使用TensorFlow工具(如ML Metadata、TensorFlow Data Validation和TensorFlow Transform)處理數據
- 使用TensorFlow Model Analysis分析模型,並在ModelValidator TFX組件確認分析結果改善後,使用TFX Model Pusher組件將其發布
- 使用TensorFlow Serving、TensorFlow Lite和TensorFlow.js在各種環境中部署模型
- 學習添加隱私的方法,包括使用TensorFlow Privacy的差分隱私和使用TensorFlow Federated的聯邦學習
- 設計模型反饋循環以增加數據集,並了解何時更新機器學習模型


Hannes Hapke is a VP of Engineering at Caravel, a machine learning company providing novel personalization products for the retail industry. Prior to joining Caravel, Hannes was a Ssenior data science engineer at Cambia Health Solutions, a health solutions provider for 2.6 million people and a machine learning engineer at Talentpair, Inc., where he developed novel deep learning model for recruiting companies. Hannes cofounded a renewable energy startup which applied deep learning to detect homes would be optimal candidates for solar power.Additionally, Hannes has coauthored a publication about natural language processing and deep learning and presented at various conferences about deep learning and Python.

Catherine Nelson is a senior data scientist for Concur Labs at SAP Concur, where she explores innovative ways to use machine learning to improve the experience of a business traveller. She is particularly interested in privacy-preserving ML and applying deep learning to enterprise data. In her previous career as a geophysicist she studied ancient volcanoes and explored for oil in Greenland. Catherine has a PhD in geophysics from Durham University and a Masters of Earth Sciences from Oxford University.


Hannes Hapke是Caravel的工程副總裁,Caravel是一家為零售業提供創新個性化產品的機器學習公司。在加入Caravel之前,Hannes曾是Cambia Health Solutions的高級數據科學工程師,該公司為260萬人提供健康解決方案,以及Talentpair, Inc.的機器學習工程師,他在該公司為招聘公司開發了新穎的深度學習模型。Hannes還共同創辦了一家可再生能源初創公司,該公司應用深度學習來檢測哪些家庭是太陽能發電的最佳候選者。此外,Hannes還共同撰寫了一篇關於自然語言處理和深度學習的論文,並在各種深度學習和Python的會議上發表過演講。

Catherine Nelson是SAP Concur的Concur Labs的高級數據科學家,她在這裡探索使用機器學習改善商務旅行者體驗的創新方法。她對保護隱私的機器學習和將深度學習應用於企業數據特別感興趣。在她之前的職業生涯中,她是一名地球物理學家,研究古老的火山並在格陵蘭尋找石油。Catherine擁有杜倫大學的地球物理學博士學位和牛津大學的地球科學碩士學位。