Replies: 1 comment
-
The timing problem comes from the whisperX module. It works best when we isolate the background. But than it still makes some strange timings. For the background isolation we use Demucs + ffmpeg denoise, there are still room for improvement. So we need to improve 2 parts. The Backround separation and the whisperX. The Whisper model v3/v2 is only for the word detection. For the timing they use wav2vec2. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
i was trying to convert a song , and the text seems to be captured totally fine,
but for some words the timing is off by several seconds from the vocals.
I tried and changed all kinds of settings but it keeps repeating the errors.
I was getting better results with less errors in voice recognition after removing
the background music myself with vocalremover.org before passing it to the tool.
But it wont fix those big timing errors.
Anybody got similar problems and maybe has found a workaround or some way to improve accuracy?
i used this combinations so far without much change:
py UltraSinger.py -i "input/xxx.mp3" --crepe full --crepe_step_size 1,5,10 --whisper large-v3/v2 --whisper_batch_size 16/32 --create_audio_chunks(on/off)
Beta Was this translation helpful? Give feedback.
All reactions