Site Reliability Engineering: How Google Runs Production Systems (Paperback)

Niall Richard Murphy, Betsy Beyer, Chris Jones, Jennifer Petoff

買這商品的人也買了...

商品描述

Building and operating distributed systems is fundamental to large-scale production infrastructure, but doing so in a scalable, reliable, and efficient way requires a lot of good design, and trial and error. In this collection of essays and articles, key members of the Site Reliability Team at Google explain how the company has successfully navigated these deep waters over the past decade.

You’ll learn how Google continuously monitors and deploys some of the largest software systems in the world, how its Site Reliability Engineering team learns and improves after outages, and how they balance risk-taking vs reliability with error budgets.

商品描述(中文翻譯)

建構和操作分散式系統是大規模生產基礎設施的基礎,但要以可擴展、可靠和高效的方式進行,需要大量的良好設計和試誤。在這本文章和論文集中,Google 的 Site Reliability 團隊的關鍵成員解釋了該公司在過去十年中如何成功地在這些深水中航行。

您將學習到 Google 如何持續監控和部署全球最大的軟體系統,以及該公司的 Site Reliability Engineering 團隊在故障後如何學習和改進,以及如何在風險與可靠性之間平衡,並設定錯誤預算。