Rhasspy (2.4) 'no' word recognition problems especially with female voices

Hello

I am experiencing an issue where it is really hard for rhasspy to recognize the word `no` especially when pronounced by a female voice (sometimes 0% rate over 10 utterances).

My sentences.ini file:

> [yesAnswer]
> yes
> yep
> yeah
> yes please
> 
> [noAnswer]
> no
> no thanks
> no thank you

`no thanks` and `no thank you` work a lot better even with female voices

My setup is as follow: rhasspy is running as a service on my raspberry PI4. I have another python3 script which is controlling rhasspy via the HTTP API, receives the transcript and forwards that over to another machine. This other script also runs at startup as a service.

I send messages to that script and ask it to query Rhasspy over HTTP.

This is the log of one of my requests

> > [INFO:304795] quart.serving: 127.0.0.1:39554 POST /api/stop-recording 1.1 200 292 215383
> [DEBUG:304792] InboxActor:  -> stopped
> [DEBUG:304789] __main__: {"intent": {"name": "noAnswer", "confidence": 1.0}, "entities": [], "text": "no", "raw_text": "no", "recognize_seconds": 0.00034929599996758043, "tokens": ["no"], "raw_tokens": ["no"], "wav_seconds": 0.0, "transcribe_seconds": 0.0, "speech_confidence": 0.1044066508421162, "slots": {}, "wakeId": "", "siteId": "default"}
> [DEBUG:304788] InboxActor:  -> stopped
> [DEBUG:304785] __main__: no
> [DEBUG:304784] InboxActor:  -> stopped
> [DEBUG:304782] PocketsphinxDecoder: no
> [DEBUG:304781] PocketsphinxDecoder: Transcription confidence: 0.1044066508421162
> [DEBUG:304780] PocketsphinxDecoder: Decoded WAV in 0.18922734260559082 second(s)
> [DEBUG:304589] PocketsphinxDecoder: rate=16000, width=2, channels=1.
> [DEBUG:304585] __main__: Recorded 137324 byte(s) of audio data
> [DEBUG:304584] InboxActor:  -> stopped
> [INFO:300307] quart.serving: 127.0.0.1:39550 POST /api/start-recording 1.1 200 2 7190

What can I do to solve this issue?

At the moment I am using **Pocketsphinx**. I noticed that Rhasspy 2.5 also now supports **Deepspeech**. Would I get a better result switching to a different recogniser such as Kaldi or Deepspeech?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rhasspy (2.4) 'no' word recognition problems especially with female voices #242

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Rhasspy (2.4) 'no' word recognition problems especially with female voices #242

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions