Programmiererforum

Posted: **03 Jan 2025, 18:16**

Ich habe einfachen Code zusammengestellt, der den Benutzer auffordert, zwischen Option 1, Orangen, und Option 2, Birnen, zu wählen:
Code: Select all
```
options = {
(1, "1", "one", "number one", "oranges", "orange", "orange's", "oranges'"): 1,
(2, "2", "two", "number two", "pears", "pear", "pear's", "pears'", "pier"): 2
}
```
Egal was ich oben schreibe, der Spracherkenner erkennt die Zahlen „eins“ und „zwei“ nicht. Nur die nicht numerischen Optionen „Nummer eins“, „Birnen“ usw. werden korrekt erkannt.

Das zweite Problem ist das am seltsamsten. Der Spracherkenner versteht keine Zahl unter 11, es sei denn, Sie sagen sie gefolgt von „.0“ („dreikomma null“). Er versteht „10“, wenn Sie „eins null“ sagen. Ab 11 versteht er die Zahlen als du sagst sie:**

Code: Select all

def convert_to_number(text):
number_words = {
"zero": 0, "one": 1, "two": 2, "three": 3, "four": 4,
"five": 5, "six": 6, "seven": 7, "eight": 8, "nine": 9,
"ten": 10, "eleven": 11, "twelve": 12, "thirteen": 13,
"fourteen": 14, "fifteen": 15, "sixteen": 16, "seventeen": 17,
"eighteen": 18, "nineteen": 19, "twenty": 20
}

Ausgabe:

Ausgabefenster – Mit Notizen zu dem, was ich sage
Ausgabefenster – Testen, was die Spracherkennung versteht

Dies waren die Schritte, die ich befolgt habe, um die beschriebenen Fehler zu beheben:
a. Ich habe zunächst die Bibliotheken pyttsx3 und Speech_recognition verwendet,
dann habe ich pyttsx3 in gtts und pydub geändert. Es gab keine Änderung im
fehlerhaften Verhalten bei verschiedenen Bibliotheken.
b. Ich habe ein en-GB-Gebietsschema hinzugefügt, auch keine Auswirkung.
c. Ich habe einen britischen Muttersprachler gebeten, die Optionen auszusprechen, es gab auch keinen Unterschied.
d. Alles ist gut konfiguriert, ffmpeg, Mikrofon usw...

Das ist der vollständige Code:

Code: Select all

from gtts import gTTS
import speech_recognition as sr
from pydub import AudioSegment
from pydub.playback import play
import os

# Set the path to the ffmpeg executable
os.environ["PATH"] += os.pathsep + "C:/ffmpeg/bin"

# Initialize STT recognizer
recognizer = sr.Recognizer()

def speak(text):
tts = gTTS(text=text, lang='en-GB')
tts.save("temp.mp3")
sound = AudioSegment.from_mp3("temp.mp3")
play(sound)
os.remove("temp.mp3")

def listen():
with sr.Microphone() as source:
print("Listening...")
audio = recognizer.listen(source)
try:
text = recognizer.recognize_google(audio, language='en-GB')
print(f"You said: {text}")
return text.lower()
except sr.UnknownValueError:
print("Sorry, I did not understand that.")
speak("Sorry, I did not understand that.")
return None
except sr.RequestError:
print("Sorry, my speech service is down.")
speak("Sorry, my speech service is down.")
return None

def convert_to_number(text):
number_words = {
# BUG: #3 All numbers below 11 are not recognized by the speech recognizer
# BUG: #4 10 is only recognized if one says "one zero" instead of "ten"
"zero": 0, "one": 1, "two": 2, "three": 3, "four": 4,
"five": 5, "six": 6, "seven": 7, "eight": 8, "nine": 9,
"ten": 10, "eleven": 11, "twelve": 12, "thirteen": 13,
"fourteen": 14, "fifteen": 15, "sixteen": 16, "seventeen": 17,
"eighteen": 18, "nineteen": 19, "twenty": 20
}
try:
return float(text)
except ValueError:
return number_words.get(text, None)

def get_choice(options):
while True:
choice = listen()
if choice is not None:
for key, value in options.items():
if choice in key:
print(f"Recognized choice: {value}")
return value
print("Invalid input. Please say a valid option.")
speak("Invalid input. Please say a valid option.")

def get_quantity():
while True:
quantity = listen()
if quantity is not None:
quantity = convert_to_number(quantity)
if quantity is not None and quantity > 0:
print(f"Recognized quantity: {quantity}")
return quantity
else:
print("Please enter a positive number.")
speak("Please enter a positive number.")

item1 = "Oranges"
item1_price = 0.75
item2 = "Pears"
item2_price = 1.25
vat_tax = 0.20

options = {
# BUG: #2 No matter what I write here the numbers "one" and "two" are not recognized by the speech recognizer
# BUG: #1 Only non-numeric options are recognized correctly
(1, "1", "one", "number one", "oranges", "orange", "orange's", "oranges'"): 1,
(2, "2", "two", "number two", "pears", "pear", "pear's", "pears'", "pier"): 2
}

while True:
speak("- What would you like to taste today, guvnor?\n"
f"  1. Our fresh {item1}, for £{item1_price} each?\n"
f"  2. Or, our delicious {item2}, for £{item2_price} each?\n")
print("- What would you like to taste today, guvnor?\n"
f"  1. Our fresh {item1}, for £{item1_price} each?\n"
f"  2.  Or, our delicious {item2}, for £{item2_price} each?\n")

buyer_choice = get_choice(options)

if buyer_choice == 1:
speak(f"\n- And how many {item1} for the lady?\n")
print(f"\n- And how many {item1} for the lady?\n")
buyer_quant = get_quantity()
sub_total = (item1_price * buyer_quant)
vat_total = (sub_total * vat_tax)
total = sub_total + vat_total
speak(f"\n- That will be {buyer_quant:,.0f} {item1} for only £{sub_total:,.2f}.\n"
f"  Plus £{vat_total:,.2f} of V.A.T., total is £{total:,.2f}.\n"
"  Thanks for your custom!\n")
print(f"\n- That will be {buyer_quant:,.0f} {item1} for only £{sub_total:,.2f}.\n"
f"  Plus £{vat_total:,.2f} of V.A.T., total is £{total:,.2f}.\n"
"  Thanks for your custom!\n")
break
elif buyer_choice == 2:
speak(f"\n- And how many {item2} for the lady?\n")
print(f"\n- And how many {item2} for the lady?\n")
buyer_quant = get_quantity()
sub_total = (item2_price * buyer_quant)
vat_total = (sub_total * vat_tax)
total = sub_total + vat_total
speak(f"\n- That will be {buyer_quant:,.0f} {item2} for only £{sub_total:,.2f}.\n"
f"  Plus £{vat_total:,.2f} of V.A.T., total is £{total:,.2f}.\n"
"  Thanks for your custom!\n")
print(f"\n- That will be {buyer_quant:,.0f} {item2} for only £{sub_total:,.2f}.\n"
f"  Plus £{vat_total:,.2f} of V.A.T., total is £{total:,.2f}.\n"
"  Thanks for your custom!\n")
break
else:
speak("\n- We just ran out of that, sorry. Please choose a valid option.\n")
print("\n- We just ran out of that, sorry. Please choose a valid option.\n")

Programmiererforum

Speech_recognition und GTTs verstehen keine Zahlen unter 11

Speech_recognition und GTTs verstehen keine Zahlen unter 11