Handbook of Markov Decision Processes: Methods and Applications
暫譯: 馬可夫決策過程手冊:方法與應用

Feinberg, Eugene A., Shwartz, Adam

  • 出版商: Springer
  • 出版日期: 2012-10-29
  • 售價: $14,920
  • 貴賓價: 9.5$14,174
  • 語言: 英文
  • 頁數: 565
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1461352487
  • ISBN-13: 9781461352488
  • 相關分類: Data ScienceMachine LearningAlgorithms-data-structures
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

Eugene A. Feinberg Adam Shwartz This volume deals with the theory of Markov Decision Processes (MDPs) and their applications. Each chapter was written by a leading expert in the re- spective area. The papers cover major research areas and methodologies, and discuss open questions and future research directions. The papers can be read independently, with the basic notation and concepts ofSection 1.2. Most chap- ters should be accessible by graduate or advanced undergraduate students in fields of operations research, electrical engineering, and computer science. 1.1 AN OVERVIEW OF MARKOV DECISION PROCESSES The theory of Markov Decision Processes-also known under several other names including sequential stochastic optimization, discrete-time stochastic control, and stochastic dynamic programming-studiessequential optimization ofdiscrete time stochastic systems. The basic object is a discrete-time stochas- tic system whose transition mechanism can be controlled over time. Each control policy defines the stochastic process and values of objective functions associated with this process. The goal is to select a good control policy. In real life, decisions that humans and computers make on all levels usually have two types ofimpacts: (i) they cost orsavetime, money, or other resources, or they bring revenues, as well as (ii) they have an impact on the future, by influencing the dynamics. In many situations, decisions with the largest immediate profit may not be good in view offuture events. MDPs model this paradigm and provide results on the structure and existence of good policies and on methods for their calculation.

商品描述(中文翻譯)

尤金·A·費因伯格 亞當·施瓦茨

本卷探討馬可夫決策過程(Markov Decision Processes, MDPs)的理論及其應用。每一章均由該領域的領先專家撰寫。這些論文涵蓋主要的研究領域和方法論,並討論未解決的問題和未來的研究方向。這些論文可以獨立閱讀,基本的符號和概念在第1.2節中介紹。大多數章節應該對運籌學、電機工程和計算機科學領域的研究生或高年級本科生是可理解的。

1.1 馬可夫決策過程概述

馬可夫決策過程的理論——也被稱為序列隨機優化、離散時間隨機控制和隨機動態規劃——研究離散時間隨機系統的序列優化。基本對象是一個離散時間隨機系統,其轉移機制可以隨時間控制。每個控制策略定義了隨機過程及與此過程相關的目標函數值。目標是選擇一個良好的控制策略。在現實生活中,人類和計算機在各個層面上所做的決策通常有兩種影響:(i)它們會花費或節省時間、金錢或其他資源,或帶來收入,以及(ii)它們會影響未來,通過影響動態。在許多情況下,獲得最大即時利潤的決策可能在未來事件的考量下並不理想。MDPs 模擬了這一範式,並提供了有關良好策略的結構和存在性以及其計算方法的結果。