Basic Features of Audio Signals ( 音訊的基本特徵 )

28
Basic Features of Audio Signals ( 音音音音音音音 ) Jyh-Shing Roger Jang ( 音音音 ) http://mirlab.org/jang MIR Lab, CSIE Dept National Taiwan Univ., Taiwan

description

Basic Features of Audio Signals ( 音訊的基本特徵 ). Jyh-Shing Roger Jang ( 張智星 ) http://mirlab.org/jang MIR Lab, CSIE Dept National Taiwan Univ., Taiwan. Audio Features. Four commonly used audio features Volume, pitch, zero crossing rate, timbre Our goal - PowerPoint PPT Presentation

Transcript of Basic Features of Audio Signals ( 音訊的基本特徵 )

Page 1: Basic Features of Audio Signals ( 音訊的基本特徵 )

Basic Features of Audio Signals( 音訊的基本特徵 )

Jyh-Shing Roger Jang ( 張智星 )http://mirlab.org/jangMIR Lab, CSIE Dept

National Taiwan Univ., Taiwan

Page 2: Basic Features of Audio Signals ( 音訊的基本特徵 )

Audio Features

Four commonly used audio features Volume, pitch, timbre, zero crossing rate

Our goal These features can be perceived (more or less)

subjectively. Our goal is to compute them quantitatively (and

objectively) for further processing and recognition.

Page 3: Basic Features of Audio Signals ( 音訊的基本特徵 )

General Steps for Audio Analysis

1. Frame blocking Frame duration of 20~40 ms or so

2. Frame-based feature extraction Volume, zero-crossing rate, pitch, MFCC, etc

3. Frame-based Analysis Pitch vector for QBSH comparison MFCC for HMM evaluation …

Page 4: Basic Features of Audio Signals ( 音訊的基本特徵 )

Frame Blocking

Sample rate = 16 kHzFrame size = 512 samplesFrame duration = 512/16000 = 0.032 s = 32 msOverlap = 192 samplesHop size = frame size – overlap = 512-192 = 320 samplesFrame rate = 16000/320 = 50 frames/sec

0 50 100 150 200 250 300-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Zoom in

Overlap

Frame

0 500 1000 1500 2000 2500-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Quiz!

Page 5: Basic Features of Audio Signals ( 音訊的基本特徵 )

0.5 1 1.5 2 2.5 3 3.5

x 104

-1

-0.5

0

0.5

1

Sample index

Am

plitu

de

taiwan.wav

50 100 150 200 250 300 350 400 450 500-1

-0.5

0

0.5

1

Sample index within the frame

Am

plitu

de

Audio Features in Time Domain

3 of the most prominent time-domain audio features in a frame (aka analysis window)

Intensity

Fundamental period

Timbre: Waveform within an FP

Quiz!

Page 6: Basic Features of Audio Signals ( 音訊的基本特徵 )

Audio Features in Frequency DomainFrequency-domain audio features in a frame

Energy: Sum of power spectrum Pitch: Distance between harmonics Timbre: Smoothed spectrum

Second formant F2First formant

F1Pitch freq

Energy

Page 7: Basic Features of Audio Signals ( 音訊的基本特徵 )

Frame-based Manipulation

For simplicity, we usually pack frames into a matrix for easy manipulation in MATLAB: [y, fs] = audioread(‘file.wav’); frameMat = enframe(y, frameSize, overlap);

Fram

e 1

Fram

e 2

Fram

e n

…frameMat =

Page 8: Basic Features of Audio Signals ( 音訊的基本特徵 )

Introduction to Volume

Loudness of audio signals Visual cue: Amplitude of vibration Also known as energy or intensity

Two major ways of computing volume: Volume:

Log energy (in decibel):1

n

ii

vol s

2

101

10*logn

ii

energy s

Quiz!

Page 9: Basic Features of Audio Signals ( 音訊的基本特徵 )

Volume: Perceived and Computed

Perceived volume is influenced by Frequency (example shown later) Timbre (example shown later)

Computed volume is influenced by Microphone types Microphone setups

Page 10: Basic Features of Audio Signals ( 音訊的基本特徵 )

Volume ComputationTo avoid DC bias (or DC drifting)

DC bias: The vibration is not around zero Computation:

Volume:

Log energy (in decibel):

Theoretical background (How to prove them?)

1

n

ii

vol s median s

2

101

10*logn

ii

energy s mean s

1 21

, ,..., arg minn

n ix

i

s s s s s x median s

2

1 21

, ,..., arg minn

n ix

i

s s s s s x mean s

Quiz!

Page 12: Basic Features of Audio Signals ( 音訊的基本特徵 )

Zero Crossing RateZero crossing rate

(ZCR) The number of zero

crossing in a frame.Characteristics :

ZCR is higher for noise and unvoiced sounds, lower for voiced sounds.

Zero-justification is required before computing ZCR.

Usage For endpoint detection,

especially in detection the start and end of unvoiced sounds.

To distinguish noise from unvoiced sound, usually we add a shift before computing ZCR.

Quiz!

Page 13: Basic Features of Audio Signals ( 音訊的基本特徵 )

ZCR ComputationsTwo types of ZCR

definitions If a sample with zero

value is considered a case of ZCR, then the value of ZCR is higher. Otherwise its lower.

The distinction diminishes when using a higher bit resolution.

Other consideration ZCR with shift can be

used to distinguish between unvoiced sounds and silence.

But it is hard to set up the right shift amount.

Page 15: Basic Features of Audio Signals ( 音訊的基本特徵 )

PitchDefinition

Pitch is also known as fundamental frequency, which is equal to the no. of fundamental period within a second. The unit used here is Hertz (Hz).

Unit More commonly, pitch

is in terms of semitone, which can be converted from pitch in Hertz:

269 12*log440

Hzsemitone

Piano roll via HTML5

Quiz!

Page 16: Basic Features of Audio Signals ( 音訊的基本特徵 )

Pitch Computation for Tuning Forks

Pitch of tuning forks (code)

semitoneff

pitch

Hzfpff

fp

9827.68440

log*1269

56.439/1

sec002275.016000/5/)7189(

2

Quiz!

Page 17: Basic Features of Audio Signals ( 音訊的基本特徵 )

Pitch Computation for Speech

Pitch of speech (code)

semitoneff

pitch

Hzfpff

fp

42.46440

log*1269

403.119/1

sec008375.016000/3/)75477(

2

Quiz!

Page 18: Basic Features of Audio Signals ( 音訊的基本特徵 )

Tones in Mandarin Chinese

Some statistics about Mandarin Chinese 5401 characters, each

character is at least associated with a base syllable and a tone

411 base syllables, and most syllables have 4 tones, so we have 1501 tonal syllables

Syllables with 3 or less tones 媽麻馬罵、當檔蕩、嗲

More examples 1234 :三民主義、三國演義、優柔寡斷

????? :美麗大教堂、滷蛋有夠鹹( Taiwanese)

Tone sandhi :勇猛果敢

Page 19: Basic Features of Audio Signals ( 音訊的基本特徵 )

Features Related to Tones

Tone is characterized by the pitch curves: Tone 1: high-high Tone 2: low-high Tone 3: high-low-high Tone 4: high-low

(Put you hand on your throat and you can feel it…)

Tone recognition is mostly based on features obtained from pitch and volume

Quiz!

Page 20: Basic Features of Audio Signals ( 音訊的基本特徵 )

Tones in Mandarin TTS

TTS: Text to speech ( demo) Tone Sandhi: phonological change occurring

in tonal language 3+3 2+3

總統、總統府、李總統、母老虎、膽小鬼 不

不好、不難 vs. 不對、不妙 一

一個、一次、一半 vs. 一般、一毛、一會兒

Page 21: Basic Features of Audio Signals ( 音訊的基本特徵 )

Mandarin Tone Practice

雙音節詞連音組合

Page 22: Basic Features of Audio Signals ( 音訊的基本特徵 )

Sentences of All Tone 3

Tone Sandhi of 3+3 請老李給我買五把好雨傘 老李買好酒請馬小姐買幾百把小雨傘 總統府裏的李總統有點想請我買酒 北海只有兩里遠,水也很淺 展覽館北館有好幾百種展覽品 你早晚打掃,我啃水果咬水餃 我很了解你,我倆永遠友好 水管可以點火,趕緊買保險

Quiz!

Page 23: Basic Features of Audio Signals ( 音訊的基本特徵 )

Pitch Change due to Fast Forward

If audio is played at a higher sample rate… Pitch is higher Duration is shorter

Pitch change due to sample rate change at playback Sample rate: fs k*fs (at playback) Duration: d d/k Fundamental frequency: ff k*ff Pitch: pitch pitch+12*log2(k)

Quiz!

Page 24: Basic Features of Audio Signals ( 音訊的基本特徵 )

Pitch Perception

Age-related hearing loss As one grows old, the

audible frequency bandwidth is getting narrower

Mosquito ringtoneLow to high, high to lowApplications

Frequencies vs. ages

18 24 40 50 1000

5

10

15

20

25

Fre

q (k

Hz)

Age

8k

12k15k

17.4k21k

Page 25: Basic Features of Audio Signals ( 音訊的基本特徵 )

Other Things about Pitch

Some interesting phenomena about pitch Beat Doppler effect Shepard tone

An auditory illusion of a tone that continually ascends or descends in pitch

Overtone singing

Have you tried these? Inhale helium to produce

high (squeaky) pitch Resonance: break a glass

with the right pitch (just like a swing)

Quiz!

Quiz!

How to create these effectsin MATLAB

Page 26: Basic Features of Audio Signals ( 音訊的基本特徵 )

Beat

Beat: An interference between two sounds of slightly different frequencies…

tff

tff

tftf2

2cos2

2cos22cos2cos 212121

Quiz!

fs=8000;

duration=5;

t=(1:duration*fs)/fs;

y1=cos(2*pi*440*t)';

y2=cos(2*pi*444*t)';

sound(y1+y2, fs);

Page 27: Basic Features of Audio Signals ( 音訊的基本特徵 )

Timbre

Timbre is represented by Waveform within a fundamental period Frame-based energy distribution over frequencies

Power spectrum (over a single frame)Spectrogram (over many frames)

Frame-based MFCC (mel-frequency cepstral coefficients)

Page 28: Basic Features of Audio Signals ( 音訊的基本特徵 )

Timbre Demo:Real-time Spectrogram

Simulink model for real-time display of spectrogram dspstfft_audio (Before MATLAB R2011a) dspstfft_audioInput (R2012a or later)

Spectrogram:Spectrum: