Implementing Service Level Objectives: A Practical Guide to Slis, Slos, and Error Budgets

Hidalgo, Alex

  • 出版商: O'Reilly
  • 出版日期: 2020-10-06
  • 售價: $2,330
  • 貴賓價: 9.5$2,214
  • 語言: 英文
  • 頁數: 404
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1492076813
  • ISBN-13: 9781492076810
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

商品描述

Although service-level objectives (SLOs) continue to grow in importance, there's a distinct lack of information about how to implement them. Practical advice that does exist usually assumes that your team already has the infrastructure, tooling, and culture in place. In this book, recognized SLO expert Alex Hidalgo explains how to build an SLO culture from the ground up.

Ideal as a primer and daily reference for anyone creating both the culture and tooling necessary for SLO-based approaches to reliability, this guide provides detailed analysis of advanced SLO and service-level indicator (SLI) techniques. Armed with mathematical models and statistical knowledge to help you get the most out of an SLO-based approach, you'll learn how to build systems capable of measuring meaningful SLIs with buy-in across all departments of your organization.

  • Define SLIs that meaningfully measure the reliability of a service from a user's perspective
  • Choose appropriate SLO targets, including how to perform statistical and probabilistic analysis
  • Use error budgets to help your team have better discussions and make better data-driven decisions
  • Build supportive tooling and resources required for an SLO-based approach
  • Use SLO data to present meaningful reports to leadership and your users

商品描述(中文翻譯)

儘管服務水準目標(SLOs)的重要性不斷增加,但關於如何實施它們的信息卻相對缺乏。現有的實用建議通常假設您的團隊已經具備了基礎設施、工具和文化。在這本書中,著名的SLO專家Alex Hidalgo解釋了如何從頭開始建立一個SLO文化。

作為一本為那些正在創建SLO可靠性方法的文化和工具的人提供基礎知識和日常參考的理想指南,本書提供了對高級SLO和服務水準指標(SLI)技術的詳細分析。憑藉數學模型和統計知識,幫助您充分利用基於SLO的方法,您將學習如何建立能夠測量有意義的SLI並獲得組織各部門支持的系統。


  • 定義有意義地從用戶角度衡量服務可靠性的SLI

  • 選擇適當的SLO目標,包括如何進行統計和概率分析

  • 使用錯誤預算幫助您的團隊進行更好的討論並做出更好的數據驅動決策

  • 建立支持基於SLO方法所需的工具和資源

  • 使用SLO數據向領導層和用戶提供有意義的報告

作者簡介

Alex Hidalgo is a Site Reliability Engineer and expert at all things related to Service Level Objectives. He developed an interest in computers at a young age, started writing his first BASIC programs at around the age of nine, and remembers the Internet when it was all still text. He eventually turned his hobby into a career, working in various capacities as a network engineer, security engineer, and systems administrator and in many roles within the world of IT support. After moving to New York, he joined Admeld as a Technical Operations Engineer, only to find himself employed by Google a few months later due to acquisition.

At Google, Alex was first introduced to the discipline of Site Reliability Engineering, which connected so closely with him that he wonders how he ever did anything else. Eventually, he found his other calling as an educator, writer, and speaker, traveling all over the world training other Site Reliability Engineers, becoming one of the primary developers of the Coursera Google IT Professional Certification, and contributing to multiple chapters of The Site Reliability Workbook -- most notably "Implementing SLOs" and "SLO Engineering Case Studies."

Recently, he has joined Squarespace, where his focus is now on spreading the concepts of SLO-based approaches to service reliability -- both internally and across the entire industry. When not sharing his passion for error budgets with others, you can find him scuba diving or watching college basketball. He lives in Park Slope, Brooklyn, with his partner Jen and a rescue dog named Taco. He thinks about SLOs so much he once had a dream about defining some for Taco. Twitter handle: @ahidalgosre

作者簡介(中文翻譯)

Alex Hidalgo是一位網站可靠性工程師,對於與服務水準目標相關的所有事物都非常熟悉。他在年幼時就對電腦產生了興趣,大約在九歲左右開始寫他的第一個BASIC程式,並且還記得當互聯網還只是文字時的情景。他最終將他的興趣轉化為職業,從事各種職位,包括網絡工程師、安全工程師和系統管理員,並在IT支援領域的多個角色中工作。在搬到紐約後,他加入了Admeld作為技術運營工程師,幾個月後因收購而被Google雇用。

在Google,Alex首次接觸到了網站可靠性工程的學科,這與他非常貼近,以至於他不知道自己以前是如何做其他事情的。最終,他發現自己還有另一個職業,作為一名教育家、作家和演講者,他遊歷世界各地培訓其他網站可靠性工程師,成為Coursera Google IT專業認證的主要開發者之一,並為《網站可靠性工作手冊》的多個章節做出了貢獻,尤其是「實施SLOs」和「SLO工程案例研究」。

最近,他加入了Squarespace,現在他的重點是在整個行業內部和外部推廣基於SLO的服務可靠性概念。當他不與他人分享對於錯誤預算的熱情時,你可以在水肺潛水或看大學籃球比賽時找到他。他與他的伴侶Jen和一隻名叫Taco的營救狗一起居住在布魯克林的Park Slope區。他對於SLOs的思考如此之深,以至於他曾經夢到為Taco定義一些SLOs。Twitter帳號:@ahidalgosre