The Site Reliability Workbook: Practical Ways to Implement SRE (Paperback)




In 2016, Google’s Site Reliability Engineering book ignited an industry discussion on what it means to run production services today—and why reliability considerations are fundamental to service design. Now, Google engineers who worked on that bestseller introduce The Site Reliability Workbook, a hands-on companion that uses concrete examples to show you how to put SRE principles and practices to work in your environment.

This new workbook not only combines practical examples from Google’s experiences, but also provides case studies from Google’s Cloud Platform customers who underwent this journey. Target, Home Depot, The New York Times, and other companies outline their hard-won experience of what worked for them and what didn’t.

Dive into this workbook and learn how to flesh out your own SRE framework, no matter what size your company is.

You’ll learn:

  • How to run reliable services in environments you don’t completely control—like cloud
  • Practical examples of how to create, monitor, and run your services via Service Level Objectives
  • How to convert existing ops teams to SRE—including how to dig out of operational overload
  • Methods for starting SRE from either greenfield or brownfield


2016年,Google的《Site Reliability Engineering》書籍引發了業界對於如何運營生產服務以及為何可靠性考量對於服務設計的重要性的討論。現在,曾參與該暢銷書籍的Google工程師們推出了《The Site Reliability Workbook》,這是一本實踐手冊,通過具體案例向您展示如何在您的環境中應用SRE原則和實踐。

這本新手冊不僅結合了Google的實踐經驗,還提供了Google Cloud Platform客戶的案例研究,他們經歷了這一過程。Target、Home Depot、《The New York Times》和其他公司分享了他們的寶貴經驗,包括他們的成功和失敗。



- 如何在您無法完全控制的環境中運營可靠的服務,例如雲端環境
- 通過服務水平目標創建、監控和運行服務的實際案例
- 如何將現有的運維團隊轉型為SRE,包括如何擺脫運維負荷
- 從綠地或棕地開始SRE的方法