How to improve quality of result #228

hanz-s · 2025-02-09T16:18:43Z

hanz-s
Feb 9, 2025

Hello,
i was trying to convert a song , and the text seems to be captured totally fine,
but for some words the timing is off by several seconds from the vocals.

I tried and changed all kinds of settings but it keeps repeating the errors.
I was getting better results with less errors in voice recognition after removing
the background music myself with vocalremover.org before passing it to the tool.
But it wont fix those big timing errors.

Anybody got similar problems and maybe has found a workaround or some way to improve accuracy?

i used this combinations so far without much change:
py UltraSinger.py -i "input/xxx.mp3" --crepe full --crepe_step_size 1,5,10 --whisper large-v3/v2 --whisper_batch_size 16/32 --create_audio_chunks(on/off)

rakuri255 · 2025-02-10T09:27:51Z

rakuri255
Feb 10, 2025
Maintainer

The timing problem comes from the whisperX module. It works best when we isolate the background. But than it still makes some strange timings. For the background isolation we use Demucs + ffmpeg denoise, there are still room for improvement.

So we need to improve 2 parts. The Backround separation and the whisperX.

The Whisper model v3/v2 is only for the word detection. For the timing they use wav2vec2.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to improve quality of result #228

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to improve quality of result #228

Uh oh!

hanz-s Feb 9, 2025

Replies: 1 comment

Uh oh!

Uh oh!

rakuri255 Feb 10, 2025 Maintainer

hanz-s
Feb 9, 2025

rakuri255
Feb 10, 2025
Maintainer