Audio samples - ConsistencyVC


  • The code: https://github.com/ConsistencyVC/ConsistencyVC-voive-conversion
  • Cross-lingual Voice Conversion:

    Target English reference utterance- 87_121553_000254_000000 in LibriTTS-100

    No. Source ConsistencyXVC (proposed) ConsistencyXVC-w/o loss (proposed) BNE-PPG-VC
    1 English 911_128684_000004_000001 in LibriTTS-100
    2 English 730_359_000004_000001 in LibriTTS-100
    3 Chinese SSB18630001 in Aishell3
    4 Chinese SSB02460001 in Aishell3
    5 Japanese jvs003\nonpara30\wav24kHz16bit\BASIC5000_0440 in JVS
    6 Japanese jvs014\nonpara30\wav24kHz16bit\BASIC5000_0318 in JVS

    Target Chinese reference utterance- SSB19350001 in Aishell3

    No. Source ConsistencyXVC (proposed) ConsistencyXVC-w/o loss (proposed) BNE-PPG-VC
    1 English 911_128684_000004_000001 in LibriTTS-100
    2 English 730_359_000004_000001 in LibriTTS-100
    3 Chinese SSB18630001 in Aishell3
    4 Chinese SSB02460001 in Aishell3
    5 Japanese jvs003\nonpara30\wav24kHz16bit\BASIC5000_0440 in JVS
    6 Japanese jvs014\nonpara30\wav24kHz16bit\BASIC5000_0318 in JVS

    Target Japanese reference utterance- jvs010\nonpara30\BASIC5000_0113 in JVS

    No. Source ConsistencyXVC (proposed) ConsistencyXVC-w/o loss (proposed) BNE-PPG-VC
    1 English 911_128684_000004_000001 in LibriTTS-100
    2 English 730_359_000004_000001 in LibriTTS-100
    3 Chinese SSB18630001 in Aishell3
    4 Chinese SSB02460001 in Aishell3
    5 Japanese jvs003\nonpara30\wav24kHz16bit\BASIC5000_0440 in JVS
    6 Japanese jvs014\nonpara30\wav24kHz16bit\BASIC5000_0318 in JVS

    Target English reference utterance- 27_123349_000003_000002 in LibriTTS-100

    No. Source ConsistencyXVC (proposed) ConsistencyXVC-w/o loss (proposed) BNE-PPG-VC
    1 English 911_128684_000004_000001 in LibriTTS-100
    2 English 730_359_000004_000001 in LibriTTS-100
    3 Chinese SSB18630001 in Aishell3
    4 Chinese SSB02460001 in Aishell3
    5 Japanese jvs003\nonpara30\wav24kHz16bit\BASIC5000_0440 in JVS
    6 Japanese jvs014\nonpara30\wav24kHz16bit\BASIC5000_0318 in JVS

    Target Chinese reference utterance- SSB17590008 in Aishell3

    No. Source ConsistencyXVC (proposed) ConsistencyXVC-w/o loss (proposed) BNE-PPG-VC
    1 English 911_128684_000004_000001 in LibriTTS-100
    2 English 730_359_000004_000001 in LibriTTS-100
    3 Chinese SSB18630001 in Aishell3
    4 Chinese SSB02460001 in Aishell3
    5 Japanese jvs003\nonpara30\wav24kHz16bit\BASIC5000_0440 in JVS
    6 Japanese jvs014\nonpara30\wav24kHz16bit\BASIC5000_0318 in JVS

    Target Japanese reference utterance- jvs009\nonpara30\BASIC5000_0155 in JVS

    No. Source ConsistencyXVC (proposed) ConsistencyXVC-w/o loss (proposed) BNE-PPG-VC
    1 English 911_128684_000004_000001 in LibriTTS-100
    2 English 730_359_000004_000001 in LibriTTS-100
    3 Chinese SSB18630001 in Aishell3
    4 Chinese SSB02460001 in Aishell3
    5 Japanese jvs003\nonpara30\wav24kHz16bit\BASIC5000_0440 in JVS
    6 Japanese jvs014\nonpara30\wav24kHz16bit\BASIC5000_0318 in JVS

    Expressive Voice Conversion:

    Source utterance- 0011_000004 in ESD

    No. Reference ConsistencyEVC ConsistencyEVC-w/o loss ConsistencyEVC-whisper
    1 0012_000374 in ESD
    2 0012_000897 in ESD
    3 0012_001188 in ESD
    4 0012_001504 in ESD
    5 0015_000619 in ESD
    6 0015_000875 in ESD
    7 0015_001233 in ESD
    8 0015_001656 in ESD

    Source utterance- 0016_000031 in ESD

    No. Reference ConsistencyEVC ConsistencyEVC-w/o loss ConsistencyEVC-whisper
    1 0012_000374 in ESD
    2 0012_000897 in ESD
    3 0012_001188 in ESD
    4 0012_001504 in ESD
    5 0015_000619 in ESD
    6 0015_000875 in ESD
    7 0015_001233 in ESD
    8 0015_001656 in ESD