Speech Coding (Part I) Waveform Coding

虞台文

Content

Overview Linear PCM (Pulse-Code Modulation) Nonlinear PCM Max-Lloyd Algorithm Differential PCM (DPCM) Adaptive PCM (ADPCM) Delta Modulation (DM)

Overview

Classification of Coding schemes

Waveform coding

Vocoding

Hybrid coding

Quality versus Bitrate of Speech Codecs

Waveform coding

Encode the waveform itself in an efficient way Signal independent Offer good quality speech requiring a bandwidth of 16

kbps or more. Time-domain techniques

– Linear PCM (Pulse-Code Modulation)– Nonlinear PCM: -law, a-law– Differential Coding: DM, DPCM, ADPCM

Frequency-domain techniques– SBC (Sub-band Coding) , ATC (Adaptive Transform Coding)

Wavelet techniques

Vocoding

‘Voice’ + ‘coding’ . Encoding information about how the speech signal

was produced by the human vocal system. These techniques can produce intelligible communi

cation at very low bit rates, usually below 4.8 kbps. However, the reproduced speech signal often sound

s quite synthetic and the speaker is often not recognisable.

LPC-10 Codec: 2400 bps American Military Standard.

Hybrid coding Combining waveform and source coding methods in

order to improve the speech quality and reduce the bitrate.

Typical bandwidth requirements lie between 4.8 and 16 kbps.

Technique: Analysis-by-synthesis– RELP (Residual Excited Linear Prediction)– CELP (Codebook Excited Linear Prediction)– MPLP (Multipulse Excited Linear Prediction)– RPE (Regular Pulse Excitation)

Quality versus Bitrate of Speech Codecs

Linear PCM(Pulse-Code Modulation)

Pulse-Code Modulation (PCM)

A method for quantizing an analog signal for the purpose of transmitting or storing the signal in digital form.

Quantization

A method for quantizing an analog signal for the purpose of transmitting or storing the signal in digital form.

Linear/Uniform Quantization

Quantization Error/Noise

granular noise

overloadnoise

Quantization Error/Noise

Unquantizedsinewave

3-bitquantizationwaveform

3-bitquantization

8-bitquantization

Quantization Step Sizemax2

The Model of Quantization Noise

X Quantization Step Size

( )x n ( )x n

( ) ( ) ( )x n x n e n

2 2( )e n

+ ( )e n

( )x n

( )e n

( )x n

Signal-to-Quatization-Noise Ratio (SQNR)

A measurement of the effect of quantization errors introduced by analog-to-digital conversion at the ADC.

10log signaldB

q noise

20log signal

q noise

signal

q noise

10log 20logsignal signaldB

q noise q noise

( ) ( ) ( )x n x n e n 2 2( )e n max2

2 2( ) ~ ( , )e n U 2

12e Assume

23 2 b

210log x

max10log3 20 log 2 20logx

max4.77 6.02 20logx

3 210log

q noise q noise

( ) ( ) ( )x n x n e n 2 2( )e n max2

2 2( ) ~ ( , )e n U 2

12e Assume

23 2 b

210log x

max10log3 20 log 2 20logx

max4.77 6.02 20logx

3 210log

Is the assumption always

appropriate? Is the assumption always

appropriate?

max4.77 6.02 20logdBx

XSQNR b

constantconstant

Each code bit contributes

The term Xmax/x tells howbig a signal can be

accurately represented

The term Xmax/x tells howbig a signal can be

accurately represented

q noise q noise

XSQNR b

Depending on the distribution of signal, which,

in turn, depends on users and time.

Depending on the distribution of signal, which,

in turn, depends on users and time.

Determined by A/D converter.

q noise q noise

XSQNR b

In what condition, the formula is reasonable?

q noise q noise

Overload Distortion

maxXmaxX

midtread

maxXmaxX

midrise

Probability of Distortion

maxXmaxX

midtread

maxXmaxX

midrise

Assume 2~ (0, )xx N

Probability of Distortion

maxXmaxX

midtread

maxXmaxX

midrise

Assume 2~ (0, )xx N

max(" ")x

XP overlad

max(" ")x

XP overlad

(" ") 0.0026xX

P overlad

(" ") 0.0026xX

P overlad

Overload and Quantization Noise withGaussian Input pdf and b=4

maxXmaxX

midtread

maxXmaxX

midrise

Assume 2~ (0, )xx N

max ( )xX dB

Uniform Quantizer Performance

max ( )xX dB

Uniform Input Pdf

max ( )xX dB

Gaussian Input Pdf

More on Uniform Quantization

max4.77 6.02 20logx

XSQNR b

Conceptually and implementationally simple.– Imposes no restrictions on signal's statistics– Maintains a constant maximum error across its total dynam

ic range. x varies so much (order of 40 dB) across sounds, spe

akers, and input conditions. We need a quantizing system where the SQNR is ind

ependent of the signal’s dynamic range, i.e., a near-constant SQNR across its dynamic range.

Nonlinear PCM

Probability Density Functionsof Speech Signals

Counting the number of samples in each interval provides an estimate of the pdf of the signal.

Good approx. is a gamma distribution, of the form

Simpler approx. is a Laplacian density, of the form:

1/ 2 3| |

8 | |x

p x ex

Distribution normalized so that x=0 and x=1•

Gamma density more closely approximates measured distribution for speech than Laplacian.

Laplacian is still a good model in analytical studies.

Small amplitudes much more likely than large amplitudes—by 100:1 ratio.

Companding

The dynamic range of signals is compressed before transmission and is expanded to the original value at the receiver.

Allowing signals with a large dynamic range to be transmitted over facilities that have a smaller dynamic range capability.

Companding reduces the noise and crosstalk levels at the receiver.

Companding

Compressor ExpanderUniformQuantizer

( )C x 1( )C xx xy y

( )g x 1( )g xx xy y

Companding

After compression, y is

Nearly uniformly distributed

The Quantization-Error Variance of Nonuniform Quantizer

12 ( )

Jayant and Noll

The Quantization-Error Variance of Nonuniform Quantizer

Jayant and Nollmax

12 ( )

The Optimal C(x)

12 ( )

Jayant and Noll

If the signal’s pdf is known, then the minimum SQNR, is achievable by letting

( )( )

p x dxC x X

p x dx

The Optimal C(x)

12 ( )

Jayant and Noll

If the signal’s pdf is known, then the minimum SQNR, is achievable by letting

( )( )

p x dxC x X

p x dx

Is the assumption realistic.Is the assumption realistic.

PDF-Independent Nonuniform Quantization

12 ( )

x p x dx

p x dxC x

Assuming overload free,

We require that SQNR is independent on p(x).

( ) /C x k x ( ) lnC x k x A

Logarithmic Companding

( ) lnC x k x A

-Law & A-Law Companding

-Law– A North American PCM standard– Used by North America and Japan

A-Law– An ITU PCM standard– Used by Europe

( )g x 1( )g xx xy y( )g x 1( )g xx xy y

Compressor ExpanderUnif ormQuantizer

-Law– A North American PCM standard– Used by North America and Japan

A-Law– An ITU PCM standard– Used by Europe

( )y C x ln 1 | |

( )ln(1 )

xsign x

( )Ay C x

| | 1( ) 0 | |

1 ln1 ln | | 1

( ) | | 11 ln

A xsign x x

A AA x

sign x xA A

(=255 in U.S. and Canada)

(A=87.56 in Europe)

( )x nmaxX

-Law Companding

( )y C x maxmax

| |ln 1

( )ln(1 )

X sign x

( ) 0 ( ) 0x n y n

max max( ) ( )x n X y n X

0 ( ) ( )y n x n

( )x nmaxX

-Law Companding

( )y C x maxmax

| |ln 1

( )ln(1 )

X sign x

maxmax max

1 | | | |( ) 1

1 | | | |ln ( ) 1

x xX sign x

X XC x

x xX sign x

( )x nmaxX

-Law Companding

( )y C x maxmax

| |ln 1

( )ln(1 )

X sign x

LinearLinear

LogLog

maxmax max

1 | | | |( ) 1

1 | | | |ln ( ) 1

x xX sign x

X XC x

x xX sign x

Histogram for -Law Companding

-law Approximation to Log

( )x n

ˆ( )y n

Distribution of quantization level for a -law

3-bit quantizer.

SQNR of -law Quantizer

6.02b dependence on b

Much less dependence on Xmax/x

For large SQNR is less sensitive to the changes in Xmax/x

max max6.02 4.77 20log ln(1 ) 10log 1 2dBx x

X XSQNR b

Comparison of Linear and -law Quantizers

max max6.02 4.77 20log ln(1 ) 10log 1 2dBx x

X XSQNR b

XSQNR b

Linear

A-Law Companding

( )Ay C x

| | 1( ) 0 | |

1 ln1 ln | | 1

( ) | | 11 ln

A xsign x x

A AA x

sign x xA A

A-Law Companding

( )Ay C x

| | 1( ) 0 | |

1 ln1 ln | | 1

( ) | | 11 ln

A xsign x x

A AA x

sign x xA A

LinearLinear

LogLog

A-Law Companding

SQNR of A-Law Companding

6.02 4.77 20log(1 )dBSQNR b A

Demonstration

PCM Demo

Max-Lloyd Algorithm

How to design a nonuniform quantizer?

xkxk1 xk+1

Q(x): Quantization (Reconstruction) Level

1k k kx q x

xkxk1 xk+1

Q(x): Quantization (Reconstruction) Level

1k k kx q x

ck ck+1 ck+2 ck+3ck1ck2

xk xk+1 xk+2 xk+3xk1xk2 xk+4

qk qk+1 qk+2 qk+3qk1qk2

Major tasks:1. Determine the decision thresholds xk’s2. Determine the reconstruction levels qk’s

Related task:3. Determine codewords ck’s

Optimal Nonuniform Quantization

22 ( )e E X Q X

An optimal quantizer is the one that minimizes the following quantization-error variance.

ck ck+1 ck+2 ck+3ck 1ck 2 ck ck+1 ck+2 ck+3ck 1ck 2

xk xk+1 xk+2 xk+3xk 1xk 2 xk+4xk xk+1 xk+2 xk+3xk 1xk 2 xk+4

qk qk+1 qk+2 qk+3qk 1qk 2 qk qk+1 qk+2 qk+3qk 1qk 2

Major tasks:1. Determine the decision thresholds xk’s2. Determine the reconstruction levels qk’s

22 ( )e E X Q X

x q p x dx

2( ) ( )e x p x dx

2* * * *1 1

( , , , , , ) arg min ( )k

N N kxx x kq q

x x q q x q p x dx

Necessary Conditions for an Optimum

2* * * *1 1

( , , , , , ) arg min ( )k

N N kxx x kq q

x x q q x q p x dx

leads to the “centroid” condition

leads to the “nearest neighborhood” condition

Necessary Conditions for an Optimum

( ), 1, ,

xp x dxq k N

p x dx

1 , 1, ,2

q qx k N

( ), 1, ,

xp x dxq k N

p x dx

1 , 1, ,2

q qx k N

This suggests an

iterative algorithm to

reach the optimum.

This suggests an

iterative algorithm to

reach the optimum.

The Max-Lloyd algorithm

1. Initialize a set of decision levels {xk} and set

2. Calculate reconstruction levels {qk} by

3. Calulate mse by

4. If , exit.5. Set and adjust decision levels {xk} by

6. Go to 2

kqxp x dx

p x dx

2kk kq q

1 22 ( )k

ke xx p x dxq

2 2e e

The Max-Lloyd algorithm

1. Initialize a set of decision levels {xk} and set

2. Calculate reconstruction levels {qk} by

3. Calulate mse by

4. If , exit.5. Set and adjust decision levels {xk} by

6. Go to 2

kqxp x dx

p x dx

2kk kq q

1 22 ( )k

ke xx p x dxq

2 2e e

This version assumes that the pdf of signa

l is availabe.This version assumes that the pdf of signa

l is availabe.

The Max-Lloyd algorithm(Practical Version)

Exercise

Differential PCM (DPCM)

Typical Audio Signals

0 500 1000 1500 2000 2500-0.8

1250 1300 1350 1400 1450 1500 1550 1600 1650 1700 1750-0.2

A segment of audio signals

Do you find any correlation and/or redundancy among the samples?

The Basic Idea of DPCM

Adjacent samples exhibit a high degree of correlation.

Removing this adjacent redundancy before encoding, a more efficient coded signal can be resulted.

How?– Accompanying with prediction (e.g., linear prediction)– Encoding prediction error only

Linear Prediction

ˆ( ) ( )p

s n a s n k

ˆ( ) ( ) ( )e n s n s n

( ) ( )p

s n a s n k

ˆ( )s n

( )s n

E* arg min p

1( , , )pa a a

Linear Predictor

ˆ( ) ( )p

s n a s n k

PredictorPredictor( )s n ˆ( )s n

DPCM Codec

ˆ( )s n

QuantizerQuantizer( )e n

( )s n

( )e n( )s n

PredictorPredictor

( )e n

( )s n

PredictorPredictor( )s n ˆ( )s nPredictorPredictorPredictorPredictor( )s n ˆ( )s n

ˆ( )s n

Channel

Channel( )e n

PredictorPredictor

( )s n+

ˆ( )s n

( )s nA/D

converter

DPCM Codec

Channel( )e n

PredictorPredictor

( )s n+

ˆ( )s n

( )s n

( )e n( )s n

PredictorPredictor

( )e n

( )s n

ˆ( )s n

Channel

( )s nA/D

converter

The dynamic range of prediction

error is much smaller than the

signal’s.

Less quantization levels needed

The dynamic range of prediction

error is much smaller than the

signal’s.

Less quantization levels needed

Performance of DPCM

By using a logarithmic compressor and a 4-bit quantizer for the error sequence e(n), DPCM results in high-quality speech at a rate of 32,000 bps, which is a factor of two lower than logarithmic PCM

Adaptive PCM (ADPCM)

Basic Concept The power level in a speech signal varies

slowly with time.

Let the quantization step dynamically adapt to the slowly time-variant power level.

( ) ( )n n

Adaptive Quantization Schemes

Feed-forward-adaptive quantizers– estimate (n) from x(n) itself– step size must be transmitted

Feedback-adaptive quantizers– adapt the step size, , on the basis of the quantize

d signal– step size needs not to be transmitted

ˆ( )x n

Feed Forward Adaptation

QuantizerQuantizer EncoderEncoder

Step-SizeAdaptation

System

Step-SizeAdaptation

System

( )x n ˆ( )x n ( )c n

( )n( )n

DecoderDecoder( )c n

ˆ( )x n

Feed Forward Adaptation

Step-SizeAdaptation

System

Step-SizeAdaptation

System

( )x n ˆ( )x n ( )c n

( )n( )n

ˆ( )x n

The source signal is not available at receiver. So, the receiver can’t evaluate (n) by itself.

(n) has to be transmitted.

( )x n

Quantization errorˆ( ) ( ) ( )e n x n x n

QuantizerQuantizer EncoderEncoder( )x n ˆ( )x n ( )c n

( )n( )n

ˆ( )x n ( )x n

Quantization errorˆ( ) ( ) ( )e n x n x n

(n) has to be transmitted.

The Step-Size Adaptation System

Step-SizeAdaptation

System

Step-SizeAdaptation

System

Estimate signal’s short-time energy,

2(n), and make (n) (n).

Estimate signal’s short-time energy,

2(n), and make (n) (n).

0( ) ( )n n

The Step-Size Adaptation System Low-Pass Filter Approach

2 2( ) ( ) ( )n

n x m h n m

( ) , 0,0 1nh n n

2 ( )n

2 2( ) ( )n

x m x n

1 2 2( ) ( )n

x m x n

2 2( 1) ( )n x n

0( ) ( )n n

The Step-Size Adaptation System Low-Pass Filter Approach

0( ) ( )n n

= 0.99 = 0.9

The Step-Size Adaptation System Moving Average Approach

( ) ( ) ( )n

n x m h n m

1( ) ,0 0 1h n M

0( ) ( )n n

Feed-Forward Quantizer

1( ) ( )

n x mM

0( ) ( )n n

(n) evaluated every M Samples Use M=128, 1024 for estimates Suitable choosing of min and max

Feed-Forward Quantizer

1( ) ( )

n x mM

0( ) ( )n n

(n) evaluated every M Samples Use M=128, 1024 for estimates Suitable choosing of min and max

Too longToo long

Feedback Adaptation

Step-SizeAdaptation

System

Step-SizeAdaptation

System

( )x n ˆ( )x n ( )c n

ˆ( )x n

Step-SizeAdaptation

System

Step-SizeAdaptation

System

(n) can be evaluated at both sides using the same alogorithm. Hence, it needs not to be transmitted.

The Step-Size Adaptation System

Step-SizeAdaptation

System

Step-SizeAdaptation

System

Step-SizeAdaptation

System

Step-SizeAdaptation

System

QuantizerQuantizer EncoderEncoder( )x n ˆ( )x n ( )c n

ˆ( )x n

The same as feed-forward adaptation except that the input changes.

Alternative Approach to Adaptation

( ) ( ) ( 1)n P n n

P(n){P1, P2, …} depends on c(n1).

Needs to impose the limits

The ratio max/min controls the dyna

mic range of the quantizer.

min max( )n

( ) ( ) ( 1)n P n n

P(n){P1, P2, …} depends on c(n1).

Needs to impose the limits

The ratio max/min controls the dyna

mic range of the quantizer.

min max( )n P1

Delta Modulation

Simplest form of DPCM– The prediction of the next is simply the current

Sampling rate chosen to be many times (e.g., 5) the Nyquist rate, adjacent samples are quite correlated, i.e., s(n)s(n1).– 1-bit (2-level) quantizer is used– Bit-rate = sampling rate

Review DPCM

ˆ( )s n

( )s n

( )e n( )s n

PredictorPredictor

( )e n

( )s n

ˆ( )s n

Channel

Channel( )e n

PredictorPredictor

( )s n+

ˆ( )s n

( )s nA/D

converter

DM Codec

ˆ( )s n

( )s n

PredictorPredictor

( )e n

Channel

Channel( )e n

PredictorPredictor

( )s n+

ˆ( )s n

A/Dconverter

Distortions of DM

0 1 1 1 1 1 0 0 0 0 1 0 0 1 0

step size

code words:1 ( ) 1

( )0 ( ) 1

e nc n

Distortions of DM

0 1 1 1 1 1 0 0 0 0 1 0 0 1 0

step size

code words:1 ( ) 1

( )0 ( ) 1

e nc n

granular noisegranular noise

slope overload condition

Choosing of Step Size

Needs small step size

Needs large step size

Adaptive DM (ADM)

( ) ( 1)( ) ( 1) e n e nn n K

Adaptive DM (ADM)

( ) ( 1)( ) ( 1) e n e nn n K

Speech Coding (Part I) Waveform Coding

Documents

Transcript of Speech Coding (Part I) Waveform Coding

Earthquake Focal Mechanisms and Waveform Modeling

AA Transmitted A AA S AA AAA - Semantic Scholar€¦ · Multiplexor equalization/ coding/packetization Microphone/ Audio D/A/ Speech Codec Speaker Pen Input Figure 4: Si gnal flow

인공지능에혁신을더하다 › wp-content › uploads › 2019 › 11 › T2... · 2019-11-28 · Speech Generation, Text-to-Speech 0144 Speech Recognition. Speech-to-Text

NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005.

(Speech Recognition for the Korean Vowel ‘ㅣ Waveform ... · 이미 애플과 삼성, 구글, 네이 버, 다음 등 다양한 기업에서 제공하고 있는 음성인식 서

Convergence Analysis of a Periodic-Like Waveform ...

MPEG-4 High Efficiency AAC Audio Coding · 2015. 1. 22. · McubeWorks, Inc. 2/12 page MPEG-4 HE AAC 1. 서 론 기존의 지각적 오디오 파형 부호기(perceptual audio waveform

Coding system

Hypertension Coding icd 10-cm coding medesun

Bitstromformate am Beispiel H.264 Annex B ...dustsigns.de/CMS/wp-content/uploads/Bitstromformate1.pdf · WAVE (Waveform Audio File Format): (L)PCM-Audio ... Video Coding Layer (VCL):

Coding Convention

MATLAB Functionality for Digital Speech Processing speech processing... · MATLAB Functionality for Digital Speech Processing • MATLAB Speech Processing Code • MATLAB GUI Implementations.

Three-dimensional acoustic Full Waveform Inversion: method ...

AND INDIRECT SPEECH / REPORTED SPEECHsmkm11tapteng.sch.id/.../2020/03/DIRECT-SPEECH-r.pdf · JENIS – JENIS INDIRECT SPEECH / REPORTED SPEECH 1. REPORTED STATEMENTS Reported statement

Digital Speech Processing— Lecture 16 speech... · 1 Digital Speech Processing— Lecture 16 Speech Coding Methods Based on Speech Waveform Representations and Speech Models—Adaptive

Makalah Speech Coding

SPEECH ACT OF PRESIDENT DONALD TRUMP’S 2017 SPEECH …

Timing Analysis with Waveform Propagation

Coding Conventions & Secure Coding

Speech Coding

NTT Labs. 2005 2005.12.16 NTT Communication Science Labs. Takehiro Moriya 守谷健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005.