SLIs and SLOs Demystified: A workshop approach to building and maintaining your service level indicators and service level objectives
暫譯: SLI 與 SLO 解密:建立與維護服務水平指標與服務水平目標的工作坊方法
McCoy, Alexandra F.
- 出版商: Packt Publishing
- 出版日期: 2025-04-25
- 售價: $1,770
- 貴賓價: 9.5 折 $1,682
- 語言: 英文
- 頁數: 300
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1835889387
- ISBN-13: 9781835889381
海外代購書籍(需單獨結帳)
相關主題
商品描述
Master reliability engineering with SLIs and SLOs to optimize performance, enhance observability, and make data-driven decisions
Key Features:
- Design precise SLIs and SLOs tailored to different system architectures and reliability goals
- Master observability techniques and incident management strategies to proactively detect and resolve issues
- Build scenario-based SLIs and SLOs with hands-on guidance for real-world reliability engineering
- Purchase of the print or Kindle book includes a free PDF eBook
Book Description:
In today's digital landscape, ensuring service reliability is more than just a necessity-it's a competitive advantage. SLIs and SLOs Demystified equips software engineers, SREs, and business leaders with the knowledge to build, measure, and manage service level indicators (SLIs) and service level objectives (SLOs) efficiently. Written by Alexandra F. McCoy-an experienced site reliability engineer with over a decade of experience in the cloud and technology industry-this book simplifies complex reliability concepts for engineers at all levels.
Starting with a review of reliability engineering basics, Alexandra provides a step-by-step approach to defining impactful SLIs, facilitating productive SLO discussions, and integrating observability into your monitoring strategy. You'll also see how these principles apply to web applications, distributed systems, databases, and new features through real-world examples that can help you develop SLIs and SLOs for your specific environment. The book goes beyond implementation to explore the financial impact of reliability, alerting strategies, integration with incident management, and using error budgets for business decisions.
By the end of this book, you'll be able to drive operational excellence, minimize unplanned downtime, and optimize end user experiences with well-established reliability metrics.
What You Will Learn:
- Formulate and implement SLIs and SLOs for assessing and enhancing system reliability objectives
- Manage incidents proactively using observability and monitoring
- Create adequate reliability metrics for complex systems
- Refine incident response strategies to minimize associated risks
- Align reliability objectives with business and technical goals
- Implement strong reliability practices across multiple teams and services
- Integrate reliability engineering with DevOps and site reliability engineering practices
Who this book is for:
This book is designed for site reliability engineers (SREs), DevOps engineers, software engineers, product managers, and business leaders looking to enhance service reliability to ensure their applications meet performance expectations. Basic knowledge of cloud services, system monitoring, and software engineering principles is beneficial.
Table of Contents
- SLIs and SLOs at the Heart of Reliability
- Establishing an SLI and SLO Team
- Things to Consider When Crafting Your SLIs and SLOs
- Observability and Monitoring Are a Necessity and a Must
- The Financial Impact of Not Adopting Indicators
- Workshop Preparation: Structuring the SLI and SLO Conversation
- Scenario 1: SLIs and SLOs for Web Applications
- Scenario 2: SLIs and SLOs for Distributed Systems
- Scenario 3: Optimizing SLIs and SLOs for Database Performance
- Scenario 4: Developing SLIs and SLOs for New Features
- SLO Monitoring and Alerting
- Service Level Performance Metrics: Daily Operations
- SLO Preservation and Incident Management
- SLIs and SLOs as a Service
商品描述(中文翻譯)
**掌握可靠性工程,利用 SLI 和 SLO 優化性能、增強可觀察性並做出數據驅動的決策**
**主要特點:**
- 設計精確的 SLI 和 SLO,以適應不同的系統架構和可靠性目標
- 精通可觀察性技術和事件管理策略,主動檢測和解決問題
- 建立基於場景的 SLI 和 SLO,提供實務指導以應對現實世界的可靠性工程
- 購買印刷版或 Kindle 書籍可獲得免費 PDF 電子書
**書籍描述:**
在當今的數位環境中,確保服務的可靠性不僅僅是一種必要性,更是一種競爭優勢。《SLI 和 SLO 的解密》為軟體工程師、SRE 和商業領導者提供了有效構建、測量和管理服務水平指標 (SLI) 和服務水平目標 (SLO) 的知識。這本書由擁有十多年雲端和科技行業經驗的資深網站可靠性工程師 Alexandra F. McCoy 撰寫,簡化了各級工程師的複雜可靠性概念。
書中首先回顧了可靠性工程的基本知識,Alexandra 提供了一個逐步的方法來定義有影響力的 SLI,促進有效的 SLO 討論,並將可觀察性整合到監控策略中。您還將看到這些原則如何應用於網頁應用程式、分散式系統、資料庫和新功能,通過現實世界的範例幫助您為特定環境開發 SLI 和 SLO。本書不僅限於實施,還探討了可靠性的財務影響、警報策略、與事件管理的整合,以及如何利用錯誤預算做出商業決策。
在本書結束時,您將能夠推動運營卓越,最小化計劃外的停機時間,並利用成熟的可靠性指標優化最終用戶體驗。
**您將學到的內容:**
- 制定和實施 SLI 和 SLO,以評估和增強系統可靠性目標
- 利用可觀察性和監控主動管理事件
- 為複雜系統創建適當的可靠性指標
- 精煉事件響應策略以最小化相關風險
- 將可靠性目標與商業和技術目標對齊
- 在多個團隊和服務中實施強大的可靠性實踐
- 將可靠性工程與 DevOps 和網站可靠性工程實踐整合
**本書適合誰:**
本書專為網站可靠性工程師 (SRE)、DevOps 工程師、軟體工程師、產品經理和希望增強服務可靠性以確保其應用程式滿足性能期望的商業領導者而設計。具備雲端服務、系統監控和軟體工程原則的基本知識將會有所幫助。
**目錄:**
- 可靠性的核心:SLI 和 SLO
- 建立 SLI 和 SLO 團隊
- 制定 SLI 和 SLO 時需考慮的事項
- 可觀察性和監控是必要且必須的
- 不採用指標的財務影響
- 工作坊準備:結構化 SLI 和 SLO 討論
- 場景 1:網頁應用程式的 SLI 和 SLO
- 場景 2:分散式系統的 SLI 和 SLO
- 場景 3:優化資料庫性能的 SLI 和 SLO
- 場景 4:為新功能開發 SLI 和 SLO
- SLO 監控和警報
- 服務水平性能指標:日常運營
- SLO 保護和事件管理
- 作為服務的 SLI 和 SLO