Paulina Mataija Poticanje Jezicno Govornog Razvoja u Djece s Pjt
Kodiranje Govornog Signala Raznim Tehnikama
-
Upload
elvir-colak-ope-pastuh -
Category
Documents
-
view
238 -
download
6
Transcript of Kodiranje Govornog Signala Raznim Tehnikama
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
1/104
1
Digital Speech ProcessingLecture 17
Speech Coding Methods Basedon Speech Models
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
2/104
2
Waveform Coding versus Block
Processing
Waveform coding
sample-by-sample matching of waveforms
coding quality measured using SNR
Source modeling (block processing)
block processing of signal => vector of outputsevery block
overlapped blocks
Block 1
Block 2
Block 3
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
3/104
3
Model-Based Speech Coding weve carried waveform coding based on optimizing and maximizing
SNRabout as far as possible
achieved bit rate reductions on the order of 4:1 (i.e., from 128 KbpsPCM to 32 Kbps ADPCM) at the same time achieving toll quality SNRfor telephone-bandwidth speech
to lower bit rate further without reducing speech quality, we need toexploit features of the speech production model, including:
source modeling spectrum modeling
use of codebook methods for coding efficiency
we also need a new way of comparing performance of differentwaveform and model-based coding methods
an objective measure, like SNR, isnt an appropriate measure for model-based coders since they operate on blocks of speech and dont followthe waveform on a sample-by-sample basis
new subjective measures need to be used that measure user-perceivedquality, intelligibility, and robustness to multiple factors
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
4/104
4
Topics Covered in this Lecture Enhancements for ADPCM Coders
pitch prediction
noise shaping
Analysis-by-Synthesis Speech Coders multipulse linear prediction coder (MPLPC)
code-excited linear prediction (CELP)
Open-Loop Speech Coders two-state excitation model
LPC vocoder
residual-excited linear predictive coder
mixed excitation systems speech coding quality measures - MOS
speech coding standards
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
5/104
5
][nx ][nd ][ nd ][nc
][~ nx ][ nx
Differential Quantization
= =
p
k
k
zzP 1)(
][nc ][ nd ][ nx
][~
nx
P: simple predictor of
vocal tract response
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
6/104
6
Issues with Differential Quantization difference signal retains the character of
the excitation signal
switches back and forth between quasi-periodic and noise-like signals
prediction duration (even when using
p=20) is order of 2.5 msec (for samplingrate of 8 kHz) predictor is predicting vocal tract response
not the excitation period (for voiced sounds)
Solution incorporate two stages ofprediction, namely a short-time predictorfor the vocal tract response and a long-
time predictor for pitch period
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
7/104
7
Pitch Prediction
first stage pitch predictor:
second stage linear predictor (vocal tract predictor):
1( ) = MP z z
21( )
==
pk
kkP z z
residual
[ ]x n [ ]x n+ [ ]d n
+[ ]x n%
+
+ +[ ]d n
Transmitter Receiver
+ ++ +
+
++
++
)(1 zP
)(2 zP )(1 1 zP=
=
=
p
k
k
k
M
zzP
zzP
12
1
)(
)(
][Q
)(1 zP )(2 zP
[ ]x n
)(zPc
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
8/104
8
Pitch Prediction
1( )
first stage pitch predictor:
this predictor model assumes that the pitch period, , is an
integer number of samples and is a gain constant allowing
for variations in pitc
=
MP z z
M
1
1 1 2 3( )
h period over time (for unvoiced or background
frames, values of and are irrelevant)
an alternative (somewhat more complicated) pitch predictor
is of the form:
+ = + +
M M
M
P z z z z 1
1
1
this more advanced form provides a way to handle non-integer
pitch period through interpolation around the nearest integer
pitch period value,
=
=
M M k
k
k
z
M
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
9/104
9
Combined Prediction
[ ][ ]
1 2
1 2
1 1 1( )
1 ( ) 1 ( ) 1 ( )
1 ( ) 1 ( ) 1 ( ) 1 1
The combined inverse system is the cascade in the decoder system:
with 2-stage prediction error filter of the form:
c
c
c
H zP z P z P z
P z P z P z
= =
= =
[ ]
[ ]
[ ]
1 2 1
1 2 1
1 2 1
( ) ( ) ( )
( ) 1 ( ) ( ) ( )
1 ( ) ( ) ( )
[ ]
[ ]
which is implemented as a parallel combination of two predictors:
and
The prediction signal, can be expressed as
c
P z P z P z
P z P z P z P z
P z P z P z
x n
x n x
=
=
%
%
1
[ ] ( [ ] [ ])p
k
k
n M x n k x n k M =
+
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
10/104
10
Combined Prediction Error
1
[ ] [ ] [ ]
[ ] [ ]
[ ] [ ] [ ]
{
The combined prediction error can be defined as:
where
is the prediction error of the pitch predictor.
The optimal values of , and
c
p
k
k
d n x n x n
v n v n k
v n x n x n M
M
=
=
=
=
%
}
[ ].[ ]
[ ]
are obtained, in theory,
by minimizing the variance of In practice a sub-optimumsolution is obtained by first minimizing the variance of and
then minimizing the variance of subje
k
c
c
d n
v n
d n
ct to the chosen values
of and M
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
11/104
11
Solution for Combined Predictor
( )22
1 ( [ ]) [ ] [ ]
Mean-squared prediction error for pitch predictor is:
where denotes averaging over a finite frame of speech samples.
We use the covariance-type of averaging to elimin
E v n x n x n M= =
( )( )
( )
( ) ( )( ) ( )
2
1
22
1 2 2
[ ] [ ]
[ ]
( [ ] [ ] )[ ] 1
[ ] [ ]
ate windowing
effects, giving the solution:
Using this value of , we solve for as
with minimum normalized covar
opt
opt
opt
x n x n M
x n M
E
x n x n ME x n
x n x n M
=
=
( ) ( )( )1/2
2 2
[ ] [ ][ ]
[ ] [ ]
iance:
=
x n x n MM
x n x n M
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
12/104
12
Solution for Combined Predictor
Steps in solution:
first search forMthat maximizes [M]
compute opt
Solve for more accurate pitch predictor byminimizing the variance of the expanded
pitch predictor
Solve for optimum vocal tract predictorcoefficients, k, k=1,2,,p
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
13/104
13
Pitch Prediction
Vocal tract
prediction
Pitch and
vocal tract
prediction
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
14/104
14
Noise Shaping in DPCM
Systems
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
15/104
15
Noise Shaping Fundamentals
[ ] [ ] [ ]
[ ]
[ ]
The output of an ADPCN encoder/decoder is:
where is the quantization noise. It is easy to
show that generally has a flat spectrum and thus
is especially audible in spe
x n x n e n
e n
e n
= +
ctral regions of low intensity,
i.e., between formants.
This has led to methods of shaping the quantization noise
to match the speech spectrum and take advantage of spectral
masking concepts
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
16/104
16
Noise Shaping
Basic ADPCM encoder and decoder
Equivalent ADPCM encoder and decoder
Noise shaping ADPCM encoder and decoder
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
17/104
17
Noise Shaping[ ] [ ] [ ] ( ) ( ) ( )
( ) ( ) ( ) ( ) [1 ( )] ( ) ( ) ( )
( ) ( ) (
The equivalence of parts (b) and (a) is shown by the following:
From part (a) we see that:
with
x n x n e n X z X z E z
D z X z P z X z P z X z P z E z
E z D z D z
= + = +
= =
=
)
( ) ( ) ( ) [1 ( )] ( ) [1 ( )] ( )1 ( ) ( ) ( ) ( )
1 ( )
1([1 ( )] ( ) [1 ( )] ( ))
1 ( )
( ) ( )
showing the equivalence of parts (b) and (a). Further, since
Fee
D z D z E z P z X z P z E z
X z H z D z D zP z
P z X z P z E zP z
X z E z
= + = +
= =
= +
= +
( ) [ ]
[ ] [ ]
ding back the quantization error through the predictor,
ensures that the reconstructed signal, , differs from
, by the quantization error, , incurred in quantizing the
difference signal
P z x n
x n e n
[ ]., d n
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
18/104
18
Shaping the Quantization Noise
[ ]
( )
( ),
1 '( ) ( ) '( ) '( )
1 ( )
11 ( ) ( ) 1 (
1 ( )
To shape the quantization noise we simply replace by
a different system function, giving the reconstructed signal as:
P z
F z
X z H z D z D z
P z
P z X z F P z
= =
= +
[ ]( )) '( )
1 ( )( ) '( )1 ( )
[ ]
'( ) ( ) '( )
1 ( )'( ) '( ) ( ) '(
1 ( )
Thus if is coded by the encoder, the -transform of the
reconstructed signal at the receiver is:
z E z
F zX z E zP z
x n z
X z X z E z
F zE z E z z E z
P z
= +
= +
= =
%
% )
1 ( )
( ) 1 ( )where is the effective noise shaping filter
F z
z P z
=
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
19/104
19
Noise Shaping Filter Options
( ) 0
( ) ( )
Noise shaping filter options:
1. and we assume noise has a flat spectrum, then
the noise and speech spectrum have the same shape
2. then the equivalent system is the standard
F z
F z P z
=
=
1
1
'( ) '( ) ( )
3. ( ) ( )
DPCM
system where with flat noise spectrum
and we ''shape'' the noise spectrum
to ''hide'' the noise beneath the spectral peaks of the speech signal;
ea
pk k
k
k
E z E z E z
F z P z z
=
= =
= =
%
[1 ( )] [1 ( )]ch zero of is paired with a zero of where the paired
zeros have the same angles in the -plane, but with a radius that is
divided by .
P z F z
z
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
20/104
20
Noise Shaping Filter
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
21/104
21
Noise Shaping Filter
2
'
2 /2 / 2
' '2 /
,
1 ( )( )
1 ( )
If we assume that the quantization noise has a flat spectrum with noise
power of then the power spectrum of the shaped noise is of the form:
S
S
S
e
j F Fj F F
e ej F F
F eP e
P e
=
Noise
spectrum
above
speechspectrum
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
22/104
22
Fully Quantized Adaptive
Predictive Coder
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
23/104
23
Full ADPCM Coder
Input isx[n]
P2(z) is the short-term (vocal tract) predictor
Signal v[n] is the short-term prediction error
Goal of encoder is to obtain a quantized representation of this excitation
signal, from which the original signal can be reconstructed.
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
24/104
24
Quantized ADPCM CoderTotal bit rate for ADPCM coder:
where is the number of bits for the quantization of thedifference signal, is the number of bits for encoding the
step size at frame ra
ADPCM S P PI BF B F B F
B
B
= + +
,
8000 1 4
te and is the total number of bits
allocated to the predictor coefficients (both long and short-term)
with frame rate
Typically and even with bits, we need between
8000 and 3200
P
P
S
F B
F
F B
=
bps for quantization of difference signal
Typically we need about 3000-4000 bps for the side information
(step size and predictor coefficients)
Overall we need between 11,000 and 36,000 bps for a fu
lly quantized system
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
25/104
25
Bit Rate for LP Coding
speech and residual sampling rate: Fs=8 kHz
LP analysis frame rate: F
=FP
= 50-100 frames/sec
quantizer stepsize: 6 bits/frame
predictor parameters: M(pitch period): 7 bits/frame
pitch predictor coefficients: 13 bits/frame
vocal tract predictor coefficients: PARCORs 16-20, 46-50 bits/frame
prediction residual: 1-3 bits/sample total bit rate:
BR = 72*FP+ Fs (minimum)
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
26/104
26
Two-Level (B=1 bit) Quantizer
Prediction residual
Quantizer input
Quantizer output
Reconstructed pitch
Original pitch residual
Reconstructed speech
Original speech
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
27/104
27
Three-Level Center-Clipped Quantizer
Prediction residual
Quantizer input
Quantizer output
Reconstructed pitch
Original pitch residual
Reconstructed speech
Original speech
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
28/104
28
Summary of Using LP in Speech
Coding the predictor can be more sophisticated than a
vocal tract response predictorcan utilizeperiodicity (for voiced speech frames)
the quantization noise spectrum can be shapedby noise feedback key concept is to hide the quantization noise under
the formant peaks in the speech, thereby utilizing theperceptual masking power of the human auditorysystem
we now move on to more advanced LP coding ofspeech using Analysis-by-Synthesis methods
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
29/104
29
Analysis-by-SynthesisSpeech Coders
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
30/104
30
A-b-S Speech Coding The key to reducing the data rate of a
closed-loop adaptive predictive coder wasto force the coded difference signal (the
input/excitation to the vocal tract model) to
be more easily represented at low datarates while maintaining very high quality at
the output of the decoder synthesizer
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
31/104
31
A-b-S Speech Coding
Replace quantizer for generating excitation signal with an
optimization process (denoted as Error Minimization above)
whereby the excitation signal, d[n] is constructed based on
minimization of the mean-squared value of the synthesis error,
d[n]=x[n]-x[n]; utilizes Perceptual Weighting filter.~
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
32/104
32
A-b-S Speech Coding
[ ] thx n p
Basic operation of each loop of closed-loop A-b-S system:
1. at the beginning of each loop (and only once each loop), the
speech signal, , is used to generate an optimum order
1
1
1 1( )1 ( )
[ ] [ ] [ ]
[ ],
p
i
i
H zP z
z
d n x n x n
x n
=
= =
=
%
%
LPC filter of the form:
2. the difference signal, , based on an initial estimate
of the speech signal, is perceptually
1 ( )( )
1 ( )
P zW z
P z
=
weighted by a speech-adaptive
filter of the form:
(see next vugraph)
3. the error minimization box and the excitation generator create a sequence
[ ]d n
of error signals that iteratively (once per loop) improve the match to theweighted error signal
4. the resulting excitation signal, , which is an improved estimate of the
actual LPC prediction error signal for each loop iteration, is used to excite the
LPC filter and the loop processing is iterated until the resulting error signal
meets some criterion for stopping the closed-loop iterations.
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
33/104
33
Perceptual Weighting Function
1 ( )
( ) 1 ( )
P z
W z P z
=
As approaches 1, weighting is flat; as approaches 0, weightingbecomes inverse frequency response of vocal tract.
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
34/104
34
Perceptual Weighting
11 2
2
1 ( )( ) , 0 1
1 ( )
Perceptual weighting filter often modified to form:
so as to make the perceptual weighting be lesssensitive to the detailed frequency response of the
vocal trac
=
P zW z
P z
t filter
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
35/104
35
Implementation of A-B-S Speech
Coding
Goal: find a representation of the excitation for
the vocal tract filter that produces high qualitysynthetic output, while maintaining a structured
representation that makes it easy to code the
excitation at low data rates Solution: use a set of basis functions which
allow you to iteratively build up an optimal
excitation function is stages, by adding a newbasis function at each iteration in the A-b-S
process
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
36/104
36
Implementation of A-B-S Speech Coding
1 2{ [ ], [ ],..., [ ]}, 0 1Q
Q
f n f n f n n L =
Assume we are given a set of basis functions of the form:
and each basis function is 0 outside the defining interval.
At each iteration of the A-b-S loop, we
( )1
2
0
:
[ ] [ ] [ ] [ ]
[ ] [ ]
L
n
E
E x n d n h n w n
h n w n
=
=
select the basis function
from that maximally reduces the perceptually weighted mean-
squared error,
where and are the VT and perceptual weighting filter[ ],
[ ] [ ]
[ ].
k
k
k
th
k k k
k f n
d n f n
f n
=
s.We denote the optimal basis function at the iteration as
giving the excitation signal where is the optimal
weighting coefficient for basis function
The A-b-S
[ ]N d n
d
iteration continues until the perceptually weighted error
falls below some desired threshold, or until a maximum number of
iterations, , is reached, giving the final excitation signal, , as:
1
[ ] [ ]k
N
k
k
n f n
=
=
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
37/104
37
Implementation of A-B-S Speech Coding
Closed Loop Coder
Reformulated Closed Loop Coder
-
7/30/2019 Kodiranje Govornog Signala Raznim Tehnikama
38/104
38
Implementation of A-B-S Speech Coding
0
0
[ ] 0
0 [ ]
[ ] 0[ ]0 0 1
Assume that is known up to current frame ( for simplicity)
Initialize estimate of the excitation, as:
Form the initial estimate of the speech signal
th
d n n
d n
d n nd nn L
=