Basic Features of Audio Signals ( 音訊的基本特徵 )
-
Upload
jacob-miranda -
Category
Documents
-
view
86 -
download
0
description
Transcript of Basic Features of Audio Signals ( 音訊的基本特徵 )
Basic Features of Audio Signals( 音訊的基本特徵 )
Jyh-Shing Roger Jang ( 張智星 )http://mirlab.org/jangMIR Lab, CSIE Dept
National Taiwan Univ., Taiwan
Audio Features
Four commonly used audio features Volume, pitch, timbre, zero crossing rate
Our goal These features can be perceived (more or less)
subjectively. Our goal is to compute them quantitatively (and
objectively) for further processing and recognition.
General Steps for Audio Analysis
1. Frame blocking Frame duration of 20~40 ms or so
2. Frame-based feature extraction Volume, zero-crossing rate, pitch, MFCC, etc
3. Frame-based Analysis Pitch vector for QBSH comparison MFCC for HMM evaluation …
Frame Blocking
Sample rate = 16 kHzFrame size = 512 samplesFrame duration = 512/16000 = 0.032 s = 32 msOverlap = 192 samplesHop size = frame size – overlap = 512-192 = 320 samplesFrame rate = 16000/320 = 50 frames/sec
0 50 100 150 200 250 300-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Zoom in
Overlap
Frame
0 500 1000 1500 2000 2500-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Quiz!
0.5 1 1.5 2 2.5 3 3.5
x 104
-1
-0.5
0
0.5
1
Sample index
Am
plitu
de
taiwan.wav
50 100 150 200 250 300 350 400 450 500-1
-0.5
0
0.5
1
Sample index within the frame
Am
plitu
de
Audio Features in Time Domain
3 of the most prominent time-domain audio features in a frame (aka analysis window)
Intensity
Fundamental period
Timbre: Waveform within an FP
Quiz!
Audio Features in Frequency DomainFrequency-domain audio features in a frame
Energy: Sum of power spectrum Pitch: Distance between harmonics Timbre: Smoothed spectrum
Second formant F2First formant
F1Pitch freq
Energy
Frame-based Manipulation
For simplicity, we usually pack frames into a matrix for easy manipulation in MATLAB: [y, fs] = audioread(‘file.wav’); frameMat = enframe(y, frameSize, overlap);
Fram
e 1
Fram
e 2
Fram
e n
…frameMat =
Introduction to Volume
Loudness of audio signals Visual cue: Amplitude of vibration Also known as energy or intensity
Two major ways of computing volume: Volume:
Log energy (in decibel):1
n
ii
vol s
2
101
10*logn
ii
energy s
Quiz!
Volume: Perceived and Computed
Perceived volume is influenced by Frequency (example shown later) Timbre (example shown later)
Computed volume is influenced by Microphone types Microphone setups
Volume ComputationTo avoid DC bias (or DC drifting)
DC bias: The vibration is not around zero Computation:
Volume:
Log energy (in decibel):
Theoretical background (How to prove them?)
1
n
ii
vol s median s
2
101
10*logn
ii
energy s mean s
1 21
, ,..., arg minn
n ix
i
s s s s s x median s
2
1 21
, ,..., arg minn
n ix
i
s s s s s x mean s
Quiz!
Examples of Volume
Functions for computing volume Example: volume01 Example: volume02 Example: volume03
Volume depends on… Frequency
Equal loudness test
TimbreExample: volume04
Zero Crossing RateZero crossing rate
(ZCR) The number of zero
crossing in a frame.Characteristics :
ZCR is higher for noise and unvoiced sounds, lower for voiced sounds.
Zero-justification is required before computing ZCR.
Usage For endpoint detection,
especially in detection the start and end of unvoiced sounds.
To distinguish noise from unvoiced sound, usually we add a shift before computing ZCR.
Quiz!
ZCR ComputationsTwo types of ZCR
definitions If a sample with zero
value is considered a case of ZCR, then the value of ZCR is higher. Otherwise its lower.
The distinction diminishes when using a higher bit resolution.
Other consideration ZCR with shift can be
used to distinguish between unvoiced sounds and silence.
But it is hard to set up the right shift amount.
Examples of ZCR
ZCR computing Example: zcr01 Example: zcr02
To use ZCR to distinguish between unvoiced sounds and environmental noise Example: Example: zcrWithShift
PitchDefinition
Pitch is also known as fundamental frequency, which is equal to the no. of fundamental period within a second. The unit used here is Hertz (Hz).
Unit More commonly, pitch
is in terms of semitone, which can be converted from pitch in Hertz:
269 12*log440
Hzsemitone
Piano roll via HTML5
Quiz!
Pitch Computation for Tuning Forks
Pitch of tuning forks (code)
semitoneff
pitch
Hzfpff
fp
9827.68440
log*1269
56.439/1
sec002275.016000/5/)7189(
2
Quiz!
Pitch Computation for Speech
Pitch of speech (code)
semitoneff
pitch
Hzfpff
fp
42.46440
log*1269
403.119/1
sec008375.016000/3/)75477(
2
Quiz!
Tones in Mandarin Chinese
Some statistics about Mandarin Chinese 5401 characters, each
character is at least associated with a base syllable and a tone
411 base syllables, and most syllables have 4 tones, so we have 1501 tonal syllables
Syllables with 3 or less tones 媽麻馬罵、當檔蕩、嗲
More examples 1234 :三民主義、三國演義、優柔寡斷
????? :美麗大教堂、滷蛋有夠鹹( Taiwanese)
Tone sandhi :勇猛果敢
Features Related to Tones
Tone is characterized by the pitch curves: Tone 1: high-high Tone 2: low-high Tone 3: high-low-high Tone 4: high-low
(Put you hand on your throat and you can feel it…)
Tone recognition is mostly based on features obtained from pitch and volume
Quiz!
Tones in Mandarin TTS
TTS: Text to speech ( demo) Tone Sandhi: phonological change occurring
in tonal language 3+3 2+3
總統、總統府、李總統、母老虎、膽小鬼 不
不好、不難 vs. 不對、不妙 一
一個、一次、一半 vs. 一般、一毛、一會兒
Sentences of All Tone 3
Tone Sandhi of 3+3 請老李給我買五把好雨傘 老李買好酒請馬小姐買幾百把小雨傘 總統府裏的李總統有點想請我買酒 北海只有兩里遠,水也很淺 展覽館北館有好幾百種展覽品 你早晚打掃,我啃水果咬水餃 我很了解你,我倆永遠友好 水管可以點火,趕緊買保險
Quiz!
Pitch Change due to Fast Forward
If audio is played at a higher sample rate… Pitch is higher Duration is shorter
Pitch change due to sample rate change at playback Sample rate: fs k*fs (at playback) Duration: d d/k Fundamental frequency: ff k*ff Pitch: pitch pitch+12*log2(k)
Quiz!
Pitch Perception
Age-related hearing loss As one grows old, the
audible frequency bandwidth is getting narrower
Mosquito ringtoneLow to high, high to lowApplications
Frequencies vs. ages
18 24 40 50 1000
5
10
15
20
25
Fre
q (k
Hz)
Age
8k
12k15k
17.4k21k
Other Things about Pitch
Some interesting phenomena about pitch Beat Doppler effect Shepard tone
An auditory illusion of a tone that continually ascends or descends in pitch
Overtone singing
Have you tried these? Inhale helium to produce
high (squeaky) pitch Resonance: break a glass
with the right pitch (just like a swing)
Quiz!
Quiz!
How to create these effectsin MATLAB
Beat
Beat: An interference between two sounds of slightly different frequencies…
tff
tff
tftf2
2cos2
2cos22cos2cos 212121
Quiz!
fs=8000;
duration=5;
t=(1:duration*fs)/fs;
y1=cos(2*pi*440*t)';
y2=cos(2*pi*444*t)';
sound(y1+y2, fs);
Timbre
Timbre is represented by Waveform within a fundamental period Frame-based energy distribution over frequencies
Power spectrum (over a single frame)Spectrogram (over many frames)
Frame-based MFCC (mel-frequency cepstral coefficients)
Timbre Demo:Real-time Spectrogram
Simulink model for real-time display of spectrogram dspstfft_audio (Before MATLAB R2011a) dspstfft_audioInput (R2012a or later)
Spectrogram:Spectrum: