Understanding Distributed Systems: What every developer should know about large distributed applications
暫譯: 理解分散式系統:每位開發者應該了解的大型分散式應用程式

Roberto Vitillo

相關主題

商品描述

Learning to build distributed systems is hard, especially if they are large scale. It's not that there is a lack of information out there. You can find academic papers, engineering blogs, and even books on the subject. The problem is that the available information is spread out all over the place, and if you were to put it on a spectrum from theory to practice, you would find a lot of material at the two ends, but not much in the middle.

That is why I decided to write a book to teach the fundamentals of distributed systems so that you don’t have to spend countless hours scratching your head to understand how everything fits together. This is the guide I wished existed when I first started out, and it's based on my experience building large distributed systems that scale to millions of requests per second and billions of devices.

If you develop the back-end of web or mobile applications (or would like to!), this book is for you. When building distributed systems, you need to be familiar with the network stack, data consistency models, scalability and reliability patterns, and much more. Although you can build applications without knowing any of that, you will end up spending hours debugging and re-designing their architecture, learning lessons that you could have acquired in a much faster and less painful way.
 

Table of contents

1 Introduction
1.1 Communication
1.2 Coordination
1.3 Scalability
1.4 Resiliency
1.5 Operations
1.6 Anatomy of a distributed system

Communication
2 Reliable links
2.1 Reliability
2.2 Connection lifecycle
2.3 Flow control
2.4 Congestion control
2.5 Custom protocols
3 Secure links
3.1 Encryption
3.2 Authentication
3.3 Integrity
3.4 Handshake
4 Discovery
5 APIs
5.1 HTTP
5.2 Resources
5.3 Request methods
5.4 Response status codes
5.5 OpenAPI
5.6 Evolution

Coordination
6 System models
7 Failure detection
8 Time
8.1 Physical clocks
8.2 Logical clocks
8.3 Vector clocks
9 Leader election
9.1 Raft leader election
9.2 Practical considerations
10 Replication
10.1 State machine replication
10.2 Consensus
10.3 Consistency models
10.4 Chain replication
10.5 Solving the CAP theorem
10.6 Coordination avoidance
11 Transactions
11.1 ACID
11.2 Isolation
11.3 Atomicity
11.4 Asynchronous transactions

Scalability
12 Functional decomposition
12.1 Microservices
12.2 API gateway
12.3 CQRS
12.4 Messaging
13 Partitioning
13.1 Sharding strategies
13.2 Rebalancing
14 Duplication
14.1 Network load balancing
14.2 Replication
14.3 Caching

Resiliency
15 Common failure causes
15.1 Single point of failure
15.2 Unreliable network
15.3 Slow processes
15.4 Unexpected load
15.5 Cascading failures
15.6 Risk management
16 Downstream resiliency
16.1 Timeout
16.2 Retry
16.3 Circuit breaker
17 Upstream resiliency
17.1 Load shedding
17.2 Load leveling
17.3 Rate-limiting
17.4 Bulkhead
17.5 Health endpoint
17.6 Watchdog

Testing and operations
18 Testing
18.1 Scope
18.2 Size
18.3 Practical considerations
19 Continuous delivery and deployment
19.1 Review and build
19.2 Pre-production
19.3 Production
19.4 Rollbacks
20 Monitoring
20.1 Metrics
20.2 Service-level indicators
20.3 Service-level objectives
20.4 Alerts
20.5 Dashboards
20.6 On-call
21 Observability
21.1 Logs
21.2 Traces
21.3 Putting it all together
22 Final words

商品描述(中文翻譯)

學習建立分散式系統是困難的,尤其是當它們的規模很大時。並不是說缺乏相關資訊,你可以找到學術論文、工程部落格,甚至是關於這個主題的書籍。問題在於可用的資訊分散在各處,如果你將其放在理論與實踐的光譜上,你會發現兩端有很多資料,但中間的資料卻不多。

這就是為什麼我決定寫一本書來教授分散式系統的基本原理,讓你不必花費無數小時來思考如何將所有內容整合在一起。這是我在剛開始時希望能有的指南,並且它是基於我建立可擴展到每秒數百萬請求和數十億設備的大型分散式系統的經驗。

如果你開發網頁或行動應用程式的後端(或希望這樣做!),這本書就是為你而寫的。在建立分散式系統時,你需要熟悉網路堆疊、資料一致性模型、可擴展性和可靠性模式等等。雖然你可以在不瞭解這些的情況下建立應用程式,但最終你會花費數小時來除錯和重新設計架構,學習那些你本可以以更快且不那麼痛苦的方式獲得的教訓。

目錄

1 介紹
1.1 通訊
1.2 協調
1.3 可擴展性
1.4 韌性
1.5 操作
1.6 分散式系統的解剖

通訊
2 可靠連結
2.1 可靠性
2.2 連接生命週期
2.3 流量控制
2.4 擁塞控制
2.5 自訂協定
3 安全連結
3.1 加密
3.2 認證
3.3 完整性
3.4 握手
4 發現
5 API
5.1 HTTP
5.2 資源
5.3 請求方法
5.4 回應狀態碼
5.5 OpenAPI
5.6 演進

協調
6 系統模型
7 故障檢測
8 時間
8.1 實體時鐘
8.2 邏輯時鐘
8.3 向量時鐘
9 領導者選舉
9.1 Raft 領導者選舉
9.2 實務考量
10 複製
10.1 狀態機複製
10.2 共識
10.3 一致性模型
10.4 鏈式複製
10.5 解決 CAP 定理
10.6 協調避免
11 交易
11.1 ACID
11.2 隔離
11.3 原子性
11.4 非同步交易

可擴展性
12 功能分解
12.1 微服務
12.2 API 閘道
12.3 CQRS
12.4 訊息傳遞
13 分區
13.1 分片策略
13.2 重新平衡
14 複製
14.1 網路負載平衡
14.2 複製
14.3 快取

韌性
15 常見故障原因
15.1 單點故障
15.2 不可靠的網路
15.3 緩慢的過程
15.4 意外負載
15.5 連鎖故障
15.6 風險管理
16 下游韌性
16.1 超時
16.2 重試
16.3 斷路器
17 上游韌性
17.1 負載削減
17.2 負載平衡
17.3 速率限制
17.4 隔艙
17.5 健康端點
17.6 看門狗

測試與操作
18 測試
18.1 範圍
18.2 大小
18.3 實務考量
19 持續交付與部署
19.1 審查與建置
19.2 預生產
19.3 生產
19.4 回滾
20 監控
20.1 指標
20.2 服務水平指標
20.3 服務水平目標
20.4 警報
20.5 儀表板
20.6 值班
21 可觀察性
21.1 日誌
21.2 追蹤
21.3 整合所有內容
22 最後的話

最後瀏覽商品 (20)