Python optimiert Realtime Speech-to-Text mit Azure SDKPython

Python-Programme
Anonymous
 Python optimiert Realtime Speech-to-Text mit Azure SDK

Post by Anonymous »

Hallo, ich versuche, mit Streamlit- und Azure Speech SDK eine Echtzeit-Rede zum Text zu erstellen. Verwendet Dateien, übergeben Sie den AudioTream und drehen Sie ihn asynchronisiert, haben jedoch auch nicht den geführten Pfad gearbeitet: https://microsoft.github.io/techexcel-i ... /0402.html< örtlich örtlich örtlich adranscription/0402.html< adranscription/örtlich örtlich örtlich adranscription/0402.html< örtlich örtlich adranscription/0402.html bearbeiteten Code: < /p>

Code: Select all

def addsentence(evt: ConversationTranscriptionEventArgs):
if evt.result.speaker_id == "Unknown":
logger.debug("Unknown speaker: " + str(evt))
return
logger.info(f"Detected **{evt.result.speaker_id}**: {evt.result.text}")
st.session_state.r.append(f"**{evt.result.speaker_id}**: {evt.result.text}")
< /code>
webrtc_ctx = webrtc_streamer(key="speech-to-text", mode=WebRtcMode.SENDONLY,
media_stream_constraints={"video": False, "audio": True},
audio_receiver_size=256)

while webrtc_ctx.state.playing:
if not st.session_state["recording"]:
st.session_state.r = []

st.session_state.stream = PushAudioInputStream()
###
audio_input = speechsdk.AudioConfig(stream=st.session_state.stream)
speech_config = speechsdk.SpeechConfig(env["SPEECH_KEY"], env["SPEECH_REGION"])
if "proxy_host" in env and "proxy_port" in env:
speech_config.set_proxy(env["proxy_host"], int(env["proxy_port"]))
conversation_transcriber = ConversationTranscriber(speech_config, audio_input, language="it-IT")

conversation_transcriber.transcribed.connect(addsentence)
###

st.session_state.fullwav = pydub.AudioSegment.empty()
with (st.chat_message("assistant")):
with st.spinner("Trascrizione in corso..."):
stream_placeholder = st.expander("Trascrizione", icon="📝").empty()

conversation_transcriber.start_transcribing_async()
logger.info("Transcribing started!")
st.session_state["recording"] = True

try:
audio_frames = webrtc_ctx.audio_receiver.get_frames(timeout=1)
except queue.Empty:
time.sleep(0.1)
logger.debug("No frame arrived.")
continue

stream_placeholder.markdown("## Trascrizione:\n\n" + "\\\n".join(st.session_state.r))

for audio_frame in audio_frames:
st.session_state.stream.write(audio_frame.to_ndarray().tobytes())
sound = pydub.AudioSegment(
data=audio_frame.to_ndarray().tobytes(),
sample_width=audio_frame.format.bytes,
frame_rate=audio_frame.sample_rate,
channels=len(audio_frame.layout.channels),
)
st.session_state.fullwav += sound

if st.session_state["recording"]:
logger.info("stopped listening")
wav_file_path= tempfile.NamedTemporaryFile(suffix='.wav', delete=False).name
st.session_state.fullwav.export(wav_file_path, format="wav")

Quick Reply

Change Text Case: 
   
  • Similar Topics
    Replies
    Views
    Last post