Speech Processing for IP Networks: Media Resource Control Protocol (Hardcover)
暫譯: IP 網路語音處理：媒體資源控制協議 (精裝版)

Name: Speech Processing for IP Networks: Media Resource Control Protocol (Hardcover)
Price: 3781 TWD
Availability: InStock
Author: David Burke
ISBN: 0470028343

David Burke

出版商: Wiley
出版日期: 2007-04-01
定價: $3,980
售價: 9.5 折 $3,781
語言: 英文
頁數: 368
裝訂: Hardcover
ISBN: 0470028343
ISBN-13: 9780470028346
相關分類: HTTP、IPV6、XML

立即出貨

買這商品的人也買了...

$1,102

Understanding Changing Telecommunications : Building a Successful Telecom Business
~~$890~~ $703

Windows 驅動程式設計指南 (Programming the Microsoft Windows Driver Model, 2/e)
~~$880~~ $695

深入淺出設計模式 (Head First Design Patterns)
~~$780~~ $741

作業系統原理 (Silberschatz: Operating System Principles, 7/e)
~~$680~~ $578

Microsoft SQL Server 2005 設計實務
~~$550~~ $468

SQL 語法範例辭典
~~$450~~ $383

CSS、HTML、XHTML 精緻範例辭典
~~$300~~ $240

14 個會議談 PMP 專案管理
~~$1,560~~ $1,326

鳥哥的 Linux 伺服器架設篇, 2/e & 鳥哥的 Linux 私房菜基礎學習篇, 2/e
~~$580~~ $452

ASP.NET AJAX 應用剖析立即上手
~~$290~~ $226

軟體測試實務講座─來自矽谷的技術經驗與心得分享
~~$560~~ $442

Linux 進化特區─Ubuntu 從入門到精通
~~$600~~ $480

現代嵌入式系統開發專案實務－菜鳥成長日誌與專案經理的私房菜
~~$750~~ $593

影視後製全攻略：Premiere Pro、After Effects、Encore
~~$560~~ $476

Kent Beck 的測試驅動開發：案例導向的逐步解決之道 (Test-Driven Development: By Example)(TDD)
~~$1,074~~ $1,020

排隊論基礎, 5/e (Fundamentals of Queueing Theory, 5/e)
~~$600~~ $450

ASP.NET Core 6 實戰守則：超易懂的跨平台開發入門教學 (iT邦幫忙鐵人賽系列書)
~~$580~~ $458

UX 策略｜設計創新數位解決方案的產品策略心法, 2/e (UX Strategy: Product Strategy Techniques for Devising Innovative Digital Solutions, 2/e)
~~$520~~ $411

Web API 設計原則｜API 與微服務傳遞價值之道 (Principles of Web API Design: Delivering Value with APIs and Microservices)
~~$560~~ $437

無瑕的程式碼敏捷篇：還原敏捷真實的面貌 (Clean Agile : Back to Basics)
~~$580~~ $458

持續 API 管理｜在不斷演變的生態系統中做出正確決策, 2/e (Continuous API Management: Making the Right Decisions in an Evolving Landscape, 2/e)
~~$650~~ $507

成功的敏捷產品管理：打造暢銷產品的祕訣
~~$880~~ $695

建構微服務｜設計細微化的系統, 2/e (Building Microservices: Designing Fine-Grained Systems, 2/e)
~~$450~~ $351

量子技術：驅動計算、人工智慧、通訊、測量的未來革命
~~$600~~ $468

OAuth 2.0 從入門到實戰：利用驗證和授權守護 API 的安全

商品描述

Description

Media Resource Control Protocol (MRCP) is a new IETF protocol, providing a key enabling technology that eases the integration of speech technologies into network equipment and accelerates their adoption resulting in exciting and compelling interactive services to be delivered over the telephone. MRCP leverages IP telephony and Web technologies such as SIP, HTTP, and XML (Extensible Markup Language) to deliver an open standard, vendor-independent, and versatile interface to speech engines.

Speech Processing for IP Networks brings these technologies together into a single volume, giving the reader a solid technical understanding of the principles of MRCP, how it leverages other protocols and specifications for its operation, and how it is applied in modern IP-based telecommunication networks. Focusing on the MRCPv2 standard developed by the IETF SpeechSC Working Group, this book will also provide an overview of its precursor, MRCPv1.

Speech Processing for IP Networks:

Gives a complete background on the technologies required by MRCP to function, including SIP (Session Initiation Protocol), RTP (Real-time Transport Protocol), and HTTP (Hypertext Transfer Protocol).

Covers relevant W3C data representation formats including Speech Synthesis Markup Language (SSML), Speech Recognition Grammar Specification (SRGS), Semantic Interpretation for Speech Recognition (SISR), and Pronunciation Lexicon Specification (PLS).

Describes VoiceXML - the leading approach for programming cutting-edge speech applications and a key driver to the development of many of MRCP’s features.

Explains advanced topics such as VoiceXML and MRCP interworking.

This text will be an invaluable resource for technical managers, product managers, software developers, and technical marketing professionals working for network equipment manufacturers, speech engine vendors, and network operators. Advanced students on computer science and engineering courses will also find this to be a useful guide.

Table of Contents

PART I. BACKGROUND.

1. Introduction.

1.1 Introduction to Speech Applications.

1.2 The MRCP Value Proposition.

1.3 History of MRCP Standardisation.

1.3.1 Internet Engineering Task Force.

1.3.2 World Wide Web Consortium.

1.3.3 MRCP: From Humble Beginnings Toward IETF Standard.

1.4 Summary.

2. Basic Principles of Speech Processing.

2.1 Human Speech Production.

2.1.1 Speech Sounds: Phonemics and Phonetics.

2.2 Speech Recognition.

2.2.1 Endpoint Detection.

2.2.2 Mel-Cepstrum.

2.2.3 Hidden Markov Models.

2.2.4 Language Modelling.

2.3 Speaker Verification and Identification.

2.3.1 Feature Extraction.

2.3.2 Statistical Modelling.

2.4 Speech Synthesis.

2.4.1 Front-end Processing.

2.4.2 Back-end Synthesis.

2.5 Summary.

3. Overview of MRCP.

3.1 Architecture.

3.2 Media Resource Types.

3.3 Network Scenarios.

3.3.1 VoiceXML IVR Service Node.

3.3.2 IP PBX with Voicemail.

3.3.3 Advanced Media Gateway.

3.4 Protocol Operation.

3.4.1 Establishing Communication Channels.

3.4.2 Controlling a Media Resource.

3.4.3 Walkthrough Examples.

3.5 Security.

3.6 Summary.

PART II. MEDIA AND CONTROL SESSIONS.

4. Session Initiation Protocol.

4.1 Introduction.

4.2 Walkthrough Example.

4.3 SIP URIs.

4.4 Transport.

4.5 Media Negotiation.

4.5.1 Session Description Protocol.

4.5.2 Offer/Answer Model.

4.6 SIP Servers.

4.6.1 Registrars.

4.6.2 Proxy Servers.

4.6.3 Redirect Servers.

4.7 SIP Extensions.

4.7.1 Capability Discovery.

4.8 Security.

4.8.1 Transport and Network Layer Security.

4.8.2 Authentication.

4.8.3 S/MIME.

4.9 Summary.

5. Session Initiation in MRCP.

5.1 Introduction.

5.2 Initiating the Media Session.

5.3 Initiating the Control Session.

5.4 Session Initiation Examples.

5.4.1 Single Media Resource.

5.4.2 Adding and Removing Media Resources.

5.4.3 Distributed Media Source/Sink.

5.5 Locating Media Resource Servers.

5.5.1 Requesting Server Capabilities.

5.5.2 Media Resource Brokers.

5.6 Security.

5.7 Summary.

6. The Media Session.

6.1 Media Encoding.

6.1.1 Pulse Code Modulation (PCM).

6.1.2 Linear Predictive Coding (LPC).

6.2 Media Transport.

6.2.1 Real-Time Protocol (RTP).

6.2.2 DTMF.

6.3 Security.

6.4 Summary.

7. The Control Session.

7.1 Message Structure.

7.1.1 Request Message.

7.1.2 Response Message.

7.1.3 Event Message.

7.1.4 Message Bodies.

7.2 Generic Methods.

7.3 Generic Headers.

7.4 Security.

7.5 Summary.

PART III. DATA REPRESENTATION FORMATS.

8. Speech Synthesis Markup Language (SSML).

8.1 Introduction.

8.2 Document Structure.

8.3 Recorded Audio.

8.4 Pronunciation.

8.4.1 Phonemic/Phonetic Content.

8.4.2 Substitution.

8.4.3 Interpreting Text .

8.5 Prosody.

8.5.1 Prosodic Boundaries.

8.5.2 Emphasis.

8.5.3 Speaking Voice.

8.5.4 Prosodic Control.

8.6 Markers .

8.7 Metadata.

8.8 Summary.

9. Speech Recognition Grammar Specification (SRGS).

9.1 Introduction.

9.2 Document Structure.

9.3 Rules, Tokens, and Sequences.

9.4 Alternatives.

9.5 Rule References.

9.5.1 Special Rules.

9.6 Repeats.

9.7 DTMF Grammars.

9.8 Semantic Interpretation.

9.8.1 Semantic Literals.

9.8.2 Semantic Scripts.

9.9 Summary.

10. Natural Language Semantics Markup Language (NLSML).

10.1 Introduction.

10.2 Document Structure.

10.3 Speech Recognition Results.

10.3.1 Serialising Semantic Interpretation Results.

10.4 Voice Enrollment Results.

10.5 Speaker Verification Results.

10.6 Summary.

11. Pronunciation Lexicon Specification (PLS).

11.1 Introduction.

11.2 Document Structure.

11.3 Lexical Entries.

11.4 Abbreviations and Acronyms.

11.5 Multiple Orthographies.

11.6 Multiple Pronunciations.

11.7 Summary.

PART IV. MEDIA RESOURCES.

12. Speech Synthesiser Resource.

12.1 Overview.

12.2 Methods.

12.2.1 SPEAK.

12.2.2 PAUSE.

12.2.3 RESUME.

12.2.4 STOP.

12.2.5 BARGE-IN-OCCURRED.

12.2.6 CONTROL.

12.2.7 DEFINE-LEXICON.

12.3 Events.

12.3.1 SPEECH-MARKER.

12.3.2 SPEAK-COMPLETE.

12.4 Headers.

12.5 Summary.

13. Speech Recogniser Resource.

13.1 Overview.

13.2 Recognition Methods.

13.2.1 RECOGNIZE.

13.2.2 DEFINE-GRAMMAR.

13.2.3 START-INPUT-TIMERS.

13.2.4 GET-RESULT.

13.2.5 STOP.

13.2.6 INTERPRET.

13.3 Enrollment Methods.

13.3.1 START-PHRASE-ENROLLMENT.

13.3.2 ENROLLMENT-ROLLBACK.

13.3.3 END-PHRASE-ENROLLMENT.

13.3.4 MODIFY-PHRASE.

13.3.5 DELETE-PHRASE.

13.4 Events.

13.4.1 START-OF-INPUT.

13.4.2 RECOGNITION-COMPLETE.

13.4.3 INTERPRETATION-COMPLETE.

13.5 Recognition Headers.

13.6 Enrollment Headers.

13.7 Summary.

14. Recorder Resource.

14.1 Overview.

14.2 Methods.

14.2.1 RECORD.

14.2.2 START-INPUT-TIMERS.

14.2.3 STOP.

14.3 Events.

14.3.1 START-OF-INPUT.

14.3.2 RECORD-COMPLETE.

14.4 Headers.

14.5 Summary.

15. Speaker Verification Resource.

15.1 Overview.

15.2 Methods.

15.2.1 START-SESSION.

15.2.2 END-SESSION.

15.2.3 VERIFY.

15.2.4 VERIFY-FROM-BUFFER.

15.2.5 VERIFY-ROLLBACK.

15.2.6 START-INPUT-TIMERS.

15.2.7 GET-INTERMEDIATE-RESULT.

15.2.8 STOP.

15.2.9 CLEAR-BUFFER.

15.2.10 QUERY-VOICEPRINT.

15.2.11 DELETE-VOICEPRINT.

15.3 Events.

15.3.1 START-OF-INPUT.

15.3.2 VERIFICATION-COMPLETE.

15.4 Headers.

15.5 Summary.

PART V. PROGRAMMING SPEECH APPLICATIONS.

16. Voice eXtensible Markup Language (VoiceXML).

16.1 Introduction.

16.2 Document Structure.

16.2.1 Applications and Dialogs.

16.3 Dialogs.

16.3.1 Forms.

16.3.2 Menus.

16.3.3 Mixed Initiative Dialogs.

16.4 Media Playback.

16.5 Media Recording.

16.6 Speech and DTMF Recognition.

16.6.1 Specifying Grammars.

16.6.2 Grammar Scope and Activation.

16.6.3 Configuring Recognition Settings.

16.6.4 Processing Recognition Results.

16.7 Flow Control.

16.7.1 Executable Content.

16.7.2 Variables, Scopes, and Expressions.

16.7.3 Document and Dialog Transitions .

16.7.4 Event Handling.

16.8 Resource Fetching.

16.9 Call Transfer.

16.10 Summary.

17. VoiceXML and MRCP Interworking.

17.1 Introduction.

17.2 Interworking Fundamentals.

17.2.1 Play Prompts.

17.2.2 Play and Recognise.

17.2.3 Record.

17.3 Application Example.

17.3.1 VoiceXML Scripts.

17.3.2 MRCP Flows.

17.4 Summary.

Appendix A. MRCP Version 1.

A.1 Overview.

A.2 Session Management and Message Transport.

A.3 General Protocol Details.

A.4 Speech Synthesiser Resource.

A.5 Speech Recogniser Resource.

Appendix B. XML Primer.

B.1 Background.

B.2 Basic Concepts.

B.3 Namespaces.

B.4 Document Schemas.

Appendix C. HTTP Primer.

C.1 Background.

C.2 Basic Concepts.

C.2.1 GET Method.

C.2.2 POST Method.

C.3 Caching.

C.4 Cookies.

C.5 Security.

References.

Index.

Acronyms.

商品描述(中文翻譯)

描述

媒體資源控制協議 (Media Resource Control Protocol, MRCP) 是一種新的 IETF 協議，提供了一項關鍵的啟用技術，簡化了語音技術與網路設備的整合，並加速其採用，從而提供令人興奮且引人注目的互動服務，這些服務可以透過電話傳遞。MRCP 利用 IP 語音和網頁技術，如 SIP、HTTP 和 XML（可擴展標記語言），提供一個開放標準、供應商獨立且多功能的語音引擎介面。

《IP 網路的語音處理》將這些技術整合成一本書，讓讀者對 MRCP 的原則有堅實的技術理解，了解它如何利用其他協議和規範進行操作，以及它在現代基於 IP 的電信網路中的應用。該書專注於 IETF SpeechSC 工作組開發的 MRCPv2 標準，並將提供其前身 MRCPv1 的概述。

《IP 網路的語音處理》：

- 提供 MRCP 運作所需技術的完整背景，包括 SIP（會話啟動協議）、RTP（實時傳輸協議）和 HTTP（超文本傳輸協議）。
- 涵蓋相關的 W3C 數據表示格式，包括語音合成標記語言（SSML）、語音識別語法規範（SRGS）、語音識別的語義解釋（SISR）和發音詞典規範（PLS）。
- 描述 VoiceXML - 用於編程尖端語音應用的主要方法，並且是許多 MRCP 功能發展的關鍵驅動力。
- 解釋進階主題，如 VoiceXML 和 MRCP 的互操作性。

本書將成為網路設備製造商、語音引擎供應商和網路運營商的技術經理、產品經理、軟體開發人員和技術行銷專業人士的重要資源。計算機科學和工程課程的高年級學生也會發現這是一本有用的指南。

第一部分背景

1. 介紹
1.1 語音應用介紹
1.2 MRCP 的價值主張
1.3 MRCP 標準化歷史
1.3.1 網際網路工程任務組
1.3.2 全球資訊網聯盟
1.3.3 MRCP：從謙卑的開始到 IETF 標準
1.4 總結

2. 語音處理的基本原則
2.1 人類語音產生
2.1.1 語音聲音：音位學和語音學
2.2 語音識別
2.2.1 端點檢測
2.2.2 Mel-倒譜
2.2.3 隱馬可夫模型
2.2.4 語言建模
2.3 語者驗證和識別
2.3.1 特徵提取
2.3.2 統計建模
2.4 語音合成
2.4.1 前端處理
2.4.2 後端合成
2.5 總結

3. MRCP 概述
3.1 架構
3.2 媒體資源類型
3.3 網路場景
3.3.1 VoiceXML IVR 服務節點
3.3.2 帶語音信箱的 IP PBX
3.3.3 先進的媒體閘道
3.4 協議操作
3.4.1 建立通信通道
3.4.2 控制媒體資源
3.4.3 實作範例
3.5 安全性
3.6 總結

第二部分媒體和控制會話

4. 會話啟動協議
4.1 介紹
4.2 實作範例
4.3 SIP URI
4.4 傳輸
4.5 媒體協商
4.5.1 會話描述協議
4.5.2 提供/回答模型
4.6 SIP 伺服器
4.6.1 註冊伺服器
4.6.2 代理伺服器
4.6.3 重新導向伺服器
4.7 SIP 擴展
4.7.1 能力發現
4.8 安全性
4.8.1 傳輸和網路層安全
4.8.2 認證
4.8.3 S/MIME
4.9 總結

5. MRCP 中的會話啟動
5.1 介紹
5.2 啟動媒體會話
5.3 啟動控制會話
5.4 會話啟動範例
5.4.1 單一媒體資源
5.4.2 添加和移除媒體資源
5.4.3 分散式媒體來源/接收器
5.5 定位媒體資源伺服器
5.5.1 請求伺服器能力
5.5.2 媒體資源經紀人
5.6 安全性
5.7 總結

6. 媒體會話
6.1 媒體編碼
6.1.1 脈衝編碼調變 (PCM)
6.1.2 線性預測編碼 (LPC)
6.2 媒體傳輸
6.2.1 實時協議 (RTP)
6.2.2 DTMF
6.3 安全性
6.4 總結

7. 控制會話
7.1 訊息結構
7.1.1 請求訊息
7.1.2 回應訊息
7.1.3 事件訊息
7.1.4 訊息主體
7.2 通用方法
7.3 通用標頭
7.4 安全性
7.5 總結

第三部分數據表示格式

8. 語音合成標記語言 (SSML)
8.1 介紹
8.2 文件結構
8.3 錄製音頻
8.4 發音
8.4.1 音位/語音內容
8.4.2 替代
8.4.3 解釋文本
8.5 音韻
8.5.1 音韻邊界
8.5.2 強調
8.5.3 說話聲音
8.5.4 音韻控制
8.6 標記
8.7 元數據
8.8 總結

9. 語音識別語法規範 (SRGS)
9.1 介紹
9.2 文件結構
9.3 規則、標記