Algorithms and Software for Predictive and Perceptual Modeling of Speech (Synthesis Lectures on Algorithms and Software Engineering)

Venkatraman Atti

商品描述

From the early pulse code modulation-based coders to some of the recent multi-rate wideband speech coding standards, the area of speech coding made several significant strides with an objective to attain high quality of speech at the lowest possible bit rate. This book presents some of the recent advances in linear prediction (LP)-based speech analysis that employ perceptual models for narrow- and wide-band speech coding. The LP analysis-synthesis framework has been successful for speech coding because it fits well the source-system paradigm for speech synthesis. Limitations associated with the conventional LP have been studied extensively, and several extensions to LP-based analysis-synthesis have been proposed, e.g., the discrete all-pole modeling, the perceptual LP, the warped LP, the LP with modified filter structures, the IIR-based pure LP, all-pole modeling using the weighted-sum of LSP polynomials, the LP for low frequency emphasis, and the cascade-form LP. These extensions can be classified as algorithms that either attempt to improve the LP spectral envelope fitting performance or embed perceptual models in the LP. The first half of the book reviews some of the recent developments in predictive modeling of speech with the help of Matlab™ Simulation examples. Advantages of integrating perceptual models in low bit rate speech coding depend on the accuracy of these models to mimic the human performance and, more importantly, on the achievable "coding gains" and "computational overhead" associated with these physiological models. Methods that exploit the masking properties of the human ear in speech coding standards, even today, are largely based on concepts introduced by Schroeder and Atal in 1979. For example, a simple approach employed in speech coding standards is to use a perceptual weighting filter to shape the quantization noise according to the masking properties of the human ear. The second half of the book reviews some of the recent developments in perceptual modeling of speech (e.g., masking threshold, psychoacoustic models, auditory excitation pattern, and loudness) with the help of Matlab™ simulations. Supplementary material including Matlab™ programs and simulation examples presented in this book can also be accessed at http://www.morganclaypool.com/page/atti. Table of Contents: Introduction / Predictive Modeling of Speech / Perceptual Modeling of Speech

商品描述(中文翻譯)

從早期的脈衝編碼調變器到一些最近的多速率寬頻語音編碼標準,語音編碼領域在追求以最低的比特率達到高質量語音方面取得了幾個重要的進展。本書介紹了一些基於線性預測(LP)的語音分析的最新進展,這些進展利用了感知模型進行窄頻和寬頻語音編碼。LP分析合成框架在語音編碼中取得了成功,因為它很好地符合語音合成的源-系統範式。傳統LP的局限性已經得到廣泛研究,並提出了幾種基於LP的分析合成的擴展,例如離散全極建模、感知LP、扭曲LP、具有修改過濾器結構的LP、基於IIR的純LP、使用LSP多項式加權和的全極建模、用於低頻強調的LP和級聯形式的LP。這些擴展可以分類為試圖改善LP頻譜包絡拟合性能或在LP中嵌入感知模型的算法。本書的前半部分回顧了一些最近在Matlab™模擬示例的幫助下對語音預測建模的發展。將感知模型集成到低比特率語音編碼中的優勢取決於這些模型模擬人類表現的準確性,更重要的是取決於這些生理模型所能實現的“編碼增益”和“計算開銷”。即使在今天,利用人耳的遮蔽特性在語音編碼標準中的方法仍然主要基於Schroeder和Atal於1979年引入的概念。例如,語音編碼標準中使用感知加權濾波器根據人耳的遮蔽特性塑造量化噪聲的簡單方法。本書的後半部分回顧了一些最近在Matlab™模擬的幫助下對語音感知建模的發展(例如遮蔽閾值、心理聽覺模型、聽覺激發模式和音量)。本書還提供了Matlab™程序和模擬示例的補充資料,可以在http://www.morganclaypool.com/page/atti上獲取。目錄:引言/語音預測建模/語音感知建模