統計的独立性と低ランク行列分解理論に基づくブラインド音源分離...

56
Blind source separation based on statistical independence and low-rank matrix decomposition –Independent low-rank matrix analysis– 総総総総総総総総総 総総総総総総総 総総総総総 総総総総総総 3 総 総総総総 2016 総 9 総 26 総 総総 () 総総総総 総総総総総総総総総総総 総総総総総総総総総総 総総総総 総総総総総総総総総総総総総総総総総総総総総 総総総総総総総総総 –総総総総総総総総総総–

Transcript of 統計的独立性と低ランク行列分解理論に基づくブラインド音源分離...

Simple Violet

Blind source separation based on statistical independence and low-rank matrix decomposition Independent low-rank matrix analysis 3

2016926

1

BSSILRMAFDICAIVAISNMFILRMAILRMANMFILRMANMF1ILRMA2

BSSILRMAFDICAIVAISNMFILRMAILRMANMFILRMANMF1ILRMA3

audio source separation etc.

1/44

CD

4

2/4

25

FDICA, IVA, ILRMANMFTF NMF

CD

L-chR-ch

2-ch

1-ch

1-ch

CD5

3/46

6

4/4nonnegative matrix factorization: NMF

7

Amplitude

Amplitude()()()Time

: : :

TimeFrequency

Frequency

[Lee, 1999], [Lee, 2000], etc.

NMFNMFTVXTVTVNMF7

BSSILRMAFDICAIVAISNMFILRMAILRMANMFILRMANMF1ILRMA8

blind source separation: BSS

BSSICA[Comon, 1994]IVA[Hiroe, 2006], [Kim, 2006]BSS9

State-of-the-art

BSS

BSSBSSICAIVAIVAState-of-the-artBSS9

FDICA10

ICA

1212

Permutation Solver

12

Freq.TimeICAfrequency-domain ICA: FDICA

ICA[Smaragdis, 1998], [Sawada, 2004], [Saruwatari, 2006], etc.

FDICAICA10

FDICADOABSS200611FDICA+DOA [Saruwatari, 2006]DOA

ICADOA

DOA

DOA

DOASource 1Source 2

DOADOA1DOA11

FDICA

ABF [Araki, 2003]ABF0ABFFDICAFDICAICA12

12

1

1

12

FDICAABF [Araki, 2003] 13

BSS ABF TR = 0 msTR = 300 msTR = 0 msTR = 300 ms

BSS2006independent vector analysis: IVA

FDICAICA

14

[Hiroe, 2006], [Kim, 2006], [Kim, 2007]

1IVAICAIVA2214

FDICAIVAscore functionScore function: gradient

IVA

IVA15

15

[Kim, 2007]

IVA16

x1x2x1x2Higher-order correlationHigher-order dependency

16

IVAIVA1

17

17

IVANMFBSS2016independent low-rank matrix analysis: ILRMA

NMF

18FrequencyTime

Frequency

Time

FrequencyTime

FrequencyBasis

BasisTime

IVAILRMA

[Kitamura, 2015], [Kitamura, 2016]

18

NMFNMF111

NMF

19

19

NMFItakura-Saito NMF: ISNMF

NMF20

[Fvotte, 2009]

20

STFT

NMF21

ImaginaryReali.i.d.

0

21

NMF22

Frequency binTime frame

: 0

22

IVANMFBSS2016independent low-rank matrix analysis: ILRMA

IVA23FrequencyTime

Frequency

Time

FrequencyTime

FrequencyBasis

BasisTime

IVAILRMA

23

ILRMAILRMA

ILRMAIVA ILRMAIVAIVA1ILRMAIVA1ILRMANMFgiven24

ISNMFIVA

24

ILRMAILRMA [Kitamura, 2016]NMF

25

, 1 0

25

ILRMA

26

IVA

NMFNMF

IVANMF26

BSSILRMAFDICAIVAISNMFILRMAILRMANMFILRMANMF1ILRMA27

NMFNMFNMF

28Ozerov and Fevotte, 2010 NMFEM ,NMFArberet et al., 2010 NMFEM ,NMFOzerov et al., 2011 NMFEM ,NMFSawada et al., 2013NMF , NMFKitamura et al., 20161NMF ,NMF

NMFNMFNMFRsTVNMFRs1Rs1RsW28

NMF [Sawada, 2013]

NMF29

TimeFrequency

TimeFrequency

TimeFrequency

TimeFrequency

TimeFrequency

29

[Duong, 2010]Duong model

30

Source image

Wiener filter

30

1

1

31

1

31

1111

32

: :

A11NMF32

NMF1

1NMFILRMANMFILRMA33

1

2.

3.

ILRMA

33

IVANMFILRMAIVANMFIVANMFNMF1NMFIVANMF

34

IVA

NMF

NMF

34

FDICA1

ICAIVA11

NMFNMF

NMFILRMA1NMF

NMF

35

BSSILRMAFDICAIVAISNMFILRMAILRMANMFILRMANMF1ILRMA36

ILRMAILRMANMFIVA1NMF1NMF11137

38 SiSECRWCP 22FFT 512 ms 128 ms (1/4) 130ILRMA160ILRMA2 SDR

2 m

Source 1

5.66 cm50

50

Source 2

2 m

Source 1

5.66 cm60

60

Source 2Impulse response E2A(reverberation time: 300 ms)Impulse response JR2(reverberation time: 470 ms)

2300msSDR38

fort_minor-remember_the_name39

Sawadas MNMFIVAOzerovs MNMFOzerovs MNMF with random initializationSawadas MNMF initialized by proposed methodProposed method w/o partitioning functionProposed method with partitioning functionDirectional clusteringSawadas MNMFIVAOzerovs MNMFOzerovs MNMF with random initializationSawadas MNMF initialized by proposed methodProposed method w/o partitioning functionProposed method with partitioning functionDirectional clustering

Violin synth.

Vocals

Violin synth.

VocalsE2A300msJR2470ms

ultimate_nz_tour40E2A300msJR2470msSawadas MNMFIVAOzerovs MNMFOzerovs MNMF with random initializationSawadas MNMF initialized by proposed methodProposed method w/o partitioning functionProposed method with partitioning functionDirectional clustering

Sawadas MNMFIVAOzerovs MNMFOzerovs MNMF with random initializationSawadas MNMF initialized by proposed methodProposed method w/o partitioning functionProposed method with partitioning functionDirectional clustering

Guitar

Synth.

Guitar

Synth.

Ozerovs MNMF with random initialization

ultimate_nz_tour41

IVAOzerovs MNMFProposed method w/o partitioning functionProposed method with partitioning functionSawadas MNMF initialized by proposed methodSawadas MNMF

GuitarSynth.

NMF42 SiSEC22FFT 256 ms 128 ms (1/4) 12ILRMA14ILRMA2 SDR

Number of bases for each source ( )

Number of bases for each source ( )

Speaker 1Speaker 2

42

female3_liverec_1m43130ms250msSawadas MNMFIVAOzerovs MNMFOzerovs MNMF with random initializationSawadas MNMF initialized by proposed methodProposed method w/o partitioning functionProposed method with partitioning functionDirectional clustering

Sawadas MNMFIVAOzerovs MNMFOzerovs MNMF with random initializationSawadas MNMF initialized by proposed methodProposed method w/o partitioning functionProposed method with partitioning functionDirectional clustering

Speaker 1

Speaker 2

Speaker 1

Speaker 2

male3_liverec_1m44130ms250ms(a)Sawadas MNMFIVAOzerovs MNMFOzerovs MNMF with random initializationSawadas MNMF initialized by proposed methodProposed method w/o partitioning functionProposed method with partitioning functionDirectional clustering

Sawadas MNMFIVAOzerovs MNMFOzerovs MNMF with random initializationSawadas MNMF initialized by proposed methodProposed method w/o partitioning functionProposed method with partitioning functionDirectional clustering

Speaker 1

Speaker 2

Speaker 1

Speaker 2

NMF1

45

SiSEC: bearlin-roads__snip_85_9914 s16 kHz: acoustic_guit_main, bass, vocals3: MATLAB 8.3, Intel Core i7-4790 (3.6 GHz): 200

46IVAMNMFILRMA()ILRMA()91.64498.4121.0173.4

s

200MNMF

1 47

147

BSSPCAPCA1NMF

48

Mixing:

BSS[Kitamura, 2015]

24BSSPCA1NMF48

1NMFNMF

49

1

11222:

1NMFT49

50 SiSECRWCP 22 PCAIVA, PCA1NMF NMF1NMFFFT 128 ms 64 ms (1/2) 130 SDR

JR2: 470 ms

2 m

80

60

122.83 cm

24470ms128ms150

: ultimate nz tour, guitar and vocal5110

PCA + 2ch IVAPCA +2ch proposed method4ch proposed method with basis sharing4ch multichannel NMF

PCA + 2ch IVAPCA + 2ch proposed method4ch multichannel NMF4ch proposed method with basis sharing53.8 s67.6 s8307.1 s330.97 s

: 200

PCAIVA1NMFNMF1NMFPCANMFNMF51

BSSILRMAFDICAIVAISNMFILRMAILRMANMFILRMANMF1ILRMA52

IVANMFILRMAFDICAIVAILRMANMFILRMAIVANMF1ILRMA1ILRMAIVA53

1/3[Lee, 1999]: D. D. Lee and H. S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, vol. 401, pp. 788791, 1999.[Lee, 2000]: D. D. Lee and H. S. Seung, Algorithms for non-negative matrix factorization, in Proc. Adv. Neural Inform. Process. Syst., 2000, vol. 13, pp. 556562.[Smaragdis, 1998]: P. Smaragdis, Blind separation of convolved mixtures in the frequency domain, Neurocomputing, vol. 22, pp. 2134, 1998.[Sawada, 2004]: H. Sawada, R. Mukai, S. Araki, and S.Makino, Convolutive blind source separation for more than two sources in the frequency domain, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2004, pp. III-885III-888.[Saruwatari, 2006]: H. Saruwatari, T. Kawamura, T. Nishikawa, A. Lee, and K. Shikano, Blind source separation based on a fast-convergence algorithm combining ICA and beamforming, IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 2, pp. 666678, Mar. 2006.[Araki, 2003]: S. Araki, S. Makino, Y. Hinamoto, R. Mukai, T. Nishikawa, and H. Saruwatari, Equivalence between frequency-domain blind source separation and frequency-domain adaptive beamforming for convolutive mixtures, EURASIP Journal on Advances in Signal Process., vol. 2003, no. 11, pp. 110, 2003.[Hiroe, 2006]: A. Hiroe, Solution of permutation problem in frequency domain ICA using multivariate probability density functions, in Proc. Int. Conf. Independent Compon. Anal. Blind Source Separation, 2006, pp. 601608.54

2/3[Kim, 2006]: T. Kim, T. Eltoft, and T.-W. Lee, Independent vector analysis: An extension of ICA to multivariate components, in Proc. Int. Conf. Independent Compon. Anal. Blind Source Separation, 2006, pp. 165172.[Kim, 2007]: T. Kim, H. T. Attias, S.-Y. Lee, and T.-W. Lee, Blind source separation exploiting higher-order frequency dependencies, IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 1, pp. 7079, 2007.[Kitamura, 2015]: D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, Efficient multichannel nonnegative matrix factorization exploiting rank-1 spatial model, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2015, pp. 276280.[Kitamura, 2016]: D. Kitamura, H. Saruwatari, H. Kameoka, Y. Takahashi, K. Kondo and S. Nakamura, Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization, IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24, no. 9, pp. 16261641, Spt. 2016.[Fvotte, 2009]: C. Fvotte, N. Bertin, and J.-L.Durrieu, Nonnegative matrix factorization with the Itakura-Saito divergence. With application to music analysis, Neural Comput., vol. 21, no. 3, pp. 793830, 2009.[Sawada, 2013]: H. Sawada, H.Kameoka, S.Araki, and N. Ueda, Multichannel extensions of non-negative matrix factorization with complex-valued data, IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 5, pp. 971982, May 2013.55

3/3[Duong, 2010]: N. Q. K. Duong, E. Vincent, and R. Gribonval, Under-determined reverberant audio source separation using a full-rank spatial covariance model, IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 7, pp. 18301840, Sep. 2010.[Kitamura, 2015]: D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, Relaxation of rank-1 spatial constraint in overdetermined blind source separation, in Proc. Eur. Signal Process. Conf., 2015, pp. 12711275.

56