Warum unterscheidet sich Audio, das von der Elevenlabs-API in Python generiert wird, von Audio, das von einer Website ge

Guest · Post by **Guest** » 07 Jan 2025, 02:22

Der Code, den ich unten erwähnen werde, erzeugt einen sehr schlechten Ton (output.mp3), wie ich auf elevanlabs getestet habe. Was ist Ihrer Meinung nach der Grund dafür? Gibt es ein Problem mit den Einstellungen?

Code: Select all

import absl.flags
import absl.app
import absl.logging
import google.generativeai as genai
import requests
import os
import pygame

# Disable unnecessary logs
absl.flags.FLAGS.stderrthreshold = "FATAL"

# Configure your API key
genai.configure(api_key="GEMİNİ_APİ") # Gemini API

# Initialize Pygame Mixer
pygame.mixer.init()

class Response:
def text(self, prompt, question):
"""Sends the prompt and question to the Gemini API and returns the response in text format."""
self.prompt = prompt
self.question = question

# Combine prompt and question
full_question = f"{self.prompt}\n{self.question}"

# Send a request to the Gemini API
model = genai.GenerativeModel("gemini-1.5-flash")
self.response = model.generate_content(full_question)

def text_response(self):
# Display the response on the screen
print(self.response.text)

Code: Select all

def voice_response(self):
url = "https://api.elevenlabs.io/v1/text-to-speech/68gbrBPLYTEZzIIJ0apU"  # Voice model API
querystring = {"optimize_streaming_latency":"2"}

payload = {
"text": self.response.text,
"voice_settings": {
"stability": 0.35,
"similarity_boost": 0.85,
"style": 0.55
}
}
headers = {
"xi-api-key": "ELEVENLABS_APİ_KEY",
"Content-Type": "application/json"
}
response_voice = requests.request("POST", url, json=payload, headers=headers, params=querystring)

if response_voice.status_code == 200:
with open("output.mp3", "wb") as file:
file.write(response_voice.content)
print("Audio successfully created and saved to 'output.mp3'.")

# Play the audio file
pygame.mixer.music.load("output.mp3")
pygame.mixer.music.play()

# Wait until the audio playback is complete
while pygame.mixer.music.get_busy():
pygame.time.Clock().tick(10)

else:
print(f"Error: {response_voice.status_code} - {response_voice.text}")

Code: Select all

# Main function
def main(argv):
prompt = """You are a friendly, polite, and respectful male employee responsible for guiding patients to the correct department and floor in a hospital.
You don't talk about things you don't know.
I will give you information about the departments and floors in the hospital. You will answer the questions asked to you based on this information!
If a patient has a problem, help them, approach them with good intentions, share your feelings with them, and give them moral support.

Departments on the 1st floor: Anesthesiology and Reanimation, Appointment making, Brain and Neurosurgery, and Pediatric Surgery.
Directions to the departments on the 1st floor:

1. Anesthesiology and Reanimation: Go straight through door A1, it is the last door on the right.
2. Appointment making: You will see it immediately to the right of the entrance.
3. Brain and Neurosurgery: It is the 2nd door on the left from door A2.
4. Neurosurgery: Go straight through door C1, it is the last right door on the 1st left.
5.  Pediatric Surgery: Go through C1 and it is the first door on the right.

Based on this information, guide the people who come to you and always remember to not ask for anything more after your answer!
Your answers should not be too short, at least 3 lines.

"""
question = input("Your question: ")
response = Response()
response.text(prompt, question)
response.voice_response()

# Main program
if __name__ == '__main__':
absl.app.run(main)

Es handelt sich um eine Anwendung, die Menschen, die mit künstlicher Intelligenz arbeiten, anhand der in einem Krankenhaus gegebenen Stockwerk-, Flächen- und ähnlichen Informationen anleitet.
Mein einziges Problem mit der Anwendung im Moment ist dass, wie ich bereits erwähnt habe, der Klang schlechter ist als der Klang, den ich mit den Elevenlabs auf der Website ausprobiert habe.
Sie können sogar neue Modelle und neue Einstellungen vorschlagen, wenn Sie möchten.
Aber bitte, das Modell muss es sein Türkisch oder unterstützen Sie seine Charaktere.

Warum unterscheidet sich Audio, das von der Elevenlabs-API in Python generiert wird, von Audio, das von einer Website ge

Warum unterscheidet sich Audio, das von der Elevenlabs-API in Python generiert wird, von Audio, das von einer Website ge ⇐ Python

Quick Reply