MIR: Status and Trends 音樂資訊檢索的現況與未來
description
Transcript of MIR: Status and Trends 音樂資訊檢索的現況與未來
112/04/22 1
MIR: Status and Trends音樂資訊檢索的現況與未來
J.-S. Roger Jang ( 張智星 )
Multimedia Information Retrieval Lab
CS Dept., Tsing Hua Univ., Taiwan
http://www.cs.nthu.edu.tw/~jang
-2-
Outline
Intro. to music information retrieval (MIR)Our work on MIR (with demos)
Query by singing/humming (QBSH) Singing voice separation
Conclusions
-3-
Types of MIR Systems
Text-based MIR Text input
歌名、歌手、歌詞、作詞者、作曲者
Metadata: 類別、情緒、口水歌
Content-based MIR Symbolic input
Music score info: 音符、節拍、和弦等
Acoustic inputBy example: 原曲輸入By humans: 哼唱、口哨、敲擊、鼓聲
-4-
Span of MIR Research
Content analysis Audio music
Low-level feature extraction
High-level feature representation
Symbolic musicHigh-level feature
representation
Retrieval methods Text-based information
retrieval Data clustering Pattern recognition Distance measures
-5-
MIR Methods for Audio Music
Audio features Low-level features
MFCC, spectral flux, rolloff freq, …
High-level featuresPitch, onset, beat, tempo,
chord, key, …Vocal extraction
OthersCollaborative filtering
Retrieval methods Clustering
K-means, VQ, hierarchical clustering
ClassificationSVM, GMM, LSA,
HMM, ANN…
Distance measureDTW, KL, cosine
similarity, edit distance
Others: Learning to rank
-6-
MIR Major Events
ISMIR/MIREX Int. Sym. on music information retrieval, since 2000 Music Information Retrieval Evaluation eXchange, since
2005
ICMC Int. Computer Music Conference, since 1974
ICASSP Int. Conf. on Acoustics, Speech, and Signal Processing ,
since 1976
-7-
ISMIR Growth: 2000-2009
YEAR LOCATION ITEMS PAGESUNIQUE
AUTHORS
2000 Plymouth, MA 35 155 63
2001 Bloomington, IN 41 222 86
2002 Paris, FR 57 300 117
2003 Baltimore, MD 50 209 111
2004 Barcelona, ES 105 582 214
2005 London, UK 114 697 233
2006 Victoria, BC 95 397 198
2007 Vienna, AT 127 486 267
2008 Philadelphia, PA 105 630 253
2009 Kobe, JP 124 773 301
TOTALS ---- 853 4451 ----
-8-
ISMIR Locations
2000, Plymouth
2001, Bloomington
2002, Paris
2003, Baltimore
2004, Barcelona
2005, London
2006, Victoria
2007, Vienna
2008, Philadelphia 2009, Kobe
-9-
State-of-the-Art MIR: Tasks at MIREX
Audio music High-level feature identification
Audio onset detection Audio beat tracking Audio tempo extraction Audio key detection Audio chord estimation Multiple fundamental frequency estimation
& tracking Audio structural segmentation
Classification Artist Genre Mood
Retrieval Audio cover song identification Audio tag classification Audio music similarity and retrieval
Alignment Real-time audio to score Alignment (a.k.a
score following)
Symbolic music Symbolic melodic similarity Symbolic music similarity and
retrieval Hybrid
Query by singing/humming Query by tapping
-10-
MIREX: 2005 - 2008
2005 2006 2007 2008
Number of Task(and Subtask) “Sets” 10 13 12 18
Number of Individuals 82 50 73 84
Number of Countries 19 14 15 19
Number of Runs 86 92 122 169
-11-
Our Work on MIR
QBSH: Query by Singing/Humming ( 哼唱檢索 )
Singing voice separation ( 人聲抽取 )Audio melody extraction ( 主旋律抽取 )
-12-
Introduction to QBSH
QBSH: Query by Singing/Humming Input: Singing or humming from microphone Output: A ranking list retrieved from the song
database
Overview First paper: Around1994 Extensive studies since 2001 State of the art: QBSH tasks at ISMIR/MIREX
-13-
Challenges in QBSH Systems
Reliable pitch tracking for acoustic input Input from mobile devices or noisy karaoke bar
Song database preparation MIDIs, singing clips, or audio music
Efficient/effective retrieval Karaoke machine: ~10,000 songs Internet music search engine: ~500,000,000 songs
-14-
-15-
QBSH: Goal and Approach
Goal: To retrieve songs effectively within a given response time, say 5 seconds or so
Our strategy Multi-stage progressive filtering Indexing for different comparison methods Repeating pattern identification
-16-
Flowchart of QBSH
Two steps Pitch tracking Comparison methods
-17-
Frame Blocking for Pitch Tracking
256 points/frame84 points overlap11025/(256-84)=64 pitch/sec
0 50 100 150 200 250 300-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Zoom in
Overlap
Frame
0 500 1000 1500 2000 2500-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
-18-
ACF: Auto-correlation Function
Frame s(n):
Shifted frame s(n-):
=30
30
acf(30) = inner product of overlap part= dot(abs(s(30:256), s(1:227))
acf():
Pitch period
-19-
Frequency to Semitone Conversion
Semitone : A music scale based on A440
Reasonable pitch range: E2 - C6 82 Hz - 1047 Hz ( - )
69440
log12 2
freq
semitone
-20-
Example of Pitch Tracking
1 2 3 4 5 6 7 8-200
-100
0
100
200soo.wav
Am
plitu
de
1 2 3 4 5 6 7 8
52
54
56
58
60
Pitc
h (s
emito
ne)
PT using ptByDpOverPfMex, with pfWeight=1 and indexDiffWeight=22
pitch1: computed pitch
-21-
Typical Result of Pitch Tracking
Pitch tracking via autocorrelation for 茉莉花 (jasmine)聲音
-22-
Comparison of Pitch VectorsYellow line : Target pitch vector
-23-
Linear Scaling (LS)
Scale the query linearly to match the candidateA typical example of linear scaling
-24-
Linear Scaling (LS)
Characteristics One-shot for dealing
with key transposition Efficient and effective Some indexing methods Cannot deal with large
tempo variations #1 method for task 2 in
QBSH/MIREX 2006
Typical mapping path
-25-
DTW Path of “Match Beginning”
-26-
DTW Path of “Match Anywhere”
-27-
DTW Path of “Match Anywhere”
-28-
QBSH at MIREX 2006
比賽方式:由主辦單位來測試每一個參賽團隊之程式碼的辨識效能。參加隊伍來自全球各地,包含澳洲、德國、法國、芬蘭、台灣、烏拉圭、荷蘭、中國等。
語料: 人聲哼唱的測試資料包含 2797 首 wav 檔案(長度 8
秒, 8KHz/8Bit ), 118 人所錄製,含 48 首兒歌,可自由下載。 歌曲資料庫包含 2048 首單音的 midi 檔案,除前述 48 首兒歌外,
其餘歌曲由主辦單位提供,不公開。 評比項目:
以 2797 wav 檔案為輸入來檢索 2048 midi 檔案:評比標準為 mean reciprocal rank ,我們達到 0.883 (第三名,全球共有 13 隊參賽)
以 2797 wav 檔案為輸入來檢索其他 2797 wav 檔案:評比標準為 mean precision ,我們達到 0.926 (第一名,全球共有 10 隊參賽)
-30-
Demos of QBSH
Real-time pitch tracking demo SAP toolbox
(http://mirlab.org/jang/matlab/toolbox/sap)goPtbyAcf.mdl
Demo of QBSH http://mirlab.org/new/mir_products.asp#miracle
Most successful QBSH application http://www.midomi.com
-31-
Singing Voice Separation
Characteristics Easier on karaoke stereo songs Harder for monaural polyphonic songs Important step for a number of MIR applications
Demo clips http://sites.google.com/site/
unvoicedsoundseparation/
-32-
On-going Research at AIST, Japan
Systems for listening to singing voices LyricSynchronizer: Automatic sync. of lyrics with
polyphonic music recordings Singer ID: Singer identification MiruSinger: Singing skill visualization/training Hyperlinking Lyrics: Creating hyperlinks between
phrases in song lyrics Breath Detection: Automatic detection of breath
sounds in unaccompanied singing voice
-33-
On-going Research at AIST, Japan (II)
Systems for music information retrieval based on singing voices VocalFinder: Music information retrieval based on
singing voice timbre Voice Drummer: Music notation of drums using vocal
percussion input
Systems for singing synthesis SingBySpeaking: Speech-to-singing synthesis VocaListener: Singing-to-singing synthesis
-34-
The Grand Challenges of MIR
Polyphonic audio music transcription Analogy to the problem of image understanding
over semitranslucent overlayed images 困難度如同觀察水波而得知烏龜或青蛙游過
-35-
Conclusions
MIR research is on the rise! MIR research over audio music (which account for
86% of MIREX tasks from 2005~2008)High-level feature identificationApplications to genre/mood/tag classification/retrieval
Preexisting approaches shed lights on MIR. Speech recognition/synthesis Text information retrieval Music theory
-36-
References J. S. Downie, D. Bryd, T. Crawford, “Ten Years of ISMIR: Reflections on
Challenges and Opportunities”, Keynote talk, Kobe, ISMIR 2010. M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney,
“Content-Based Music Information Retrieval: Current Directions and Future Challenges”, Proceedings of IEEE, Vol. 96, No. 4, April 2008.
J.-S. R. Jang and H.-R. Lee, "A General Framework of Progressive Filtering and Its Application to Query by Singing/Humming", IEEE Transactions on Audio, Speech, and Language Processing, No. 2, Vol. 16, PP. 350-358, Feb 2008.
Z.-S. Chen, and J.-S. R. Jang, "On the Use of Anti-word Models for Audio Music Annotation and Retrieval", IEEE Transactions on Audio, Speech, and Language Processing, 2009.
C.-L. Hsu and J.-S. R. Jang, "On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset", IEEE Transactions on Audio, Speech, and Language Processing, 2009.
Masataka Goto, Takeshi Saitou, Tomoyasu Nakano, and Hiromasa Fujihara, “Singing Information Processing Based on Singing Voice Modeling”, PP. 5506-5509, ICASSP 2010.