Browse ModelsChatterboxChatterbox Speech To Speech

Chatterbox Speech To Speech

Chatterbox Speech To Speech

Playground

Try it on WavespeedAI!

Chatterbox Speech to Speech is a fast AI voice conversion model that converts source audio into a target voice style with optional reference audio guidance. Ready-to-use REST inference API for voice conversion, speech style transfer, dubbing, character voices, creator content, audio localization, and professional speech-to-speech workflows with simple integration, no coldstarts, and affordable pricing.

Features

Chatterbox Speech-to-Speech

Chatterbox Speech-to-Speech transforms a source audio clip into a target voice style using optional reference audio. It is suitable for voice conversion, style transfer, creator dubbing, character voice prototyping, and other speech-to-speech workflows where you want to preserve spoken content while changing vocal identity or delivery style.


Why Choose This?

  • Speech-to-speech conversion
    Transform an existing speech recording into a different voice style.

  • Optional reference voice guidance
    Add reference_audio when you want the output to follow a particular vocal tone or character.

  • Simple workflow
    Upload source audio, optionally upload a reference voice sample, and generate the converted result.

  • Useful for creator and dubbing workflows
    Suitable for voice restyling, character voice tests, demo production, and spoken-content transformation.

  • Production-ready API
    Useful for narration replacement, voice experiments, content localization, and creative audio workflows.


Parameters

ParameterRequiredDescription
audioYesSource audio to convert.
reference_audioNoOptional reference audio used to guide the target voice style.

How to Use

  1. Upload your source audio — provide the speech recording you want to transform.
  2. Upload reference audio (optional) — add a target voice sample if you want stronger style guidance.
  3. Submit — run the model and download the converted speech audio.

Example Use Case

Convert a spoken voice clip into a different vocal style for creator content, dubbing, or character voice testing.


Pricing

Just $0.02 per started minute.

Billing Rules

  • Pricing is $0.02 per started minute
  • Audio duration is billed in started 60-second units
  • Audio shorter than 60 seconds is billed as 1 minute
  • reference_audio does not affect pricing

Example Costs

Audio DurationCost
1s–60s$0.02
61s–120s$0.04
121s–180s$0.06

Best Use Cases

  • Voice style transfer — Convert speech into a different vocal tone or identity.
  • Character voice prototyping — Test alternative voice styles for characters or avatars.
  • Creator dubbing — Rework spoken audio for short-form content or promos.
  • Narration restyling — Preserve content while changing delivery feel.
  • Speech workflow experiments — Compare different voice directions from the same recording.

Pro Tips

  • Use clean source audio for better intelligibility.
  • Add reference_audio only when you want stronger target voice guidance.
  • Use a clear reference sample with stable tone for more consistent conversion.
  • Short clips are useful for testing before processing longer audio.

Notes

  • audio is required.
  • reference_audio is optional.
  • Pricing is based on source audio duration and billed per started minute.
  • Better source audio and cleaner reference audio generally improve output quality.

  • Chatterbox Text-to-Speech — Generate speech directly from text.
  • Voice cloning workflows — Useful when you need a reusable custom voice identity instead of per-request voice guidance.
  • Audio generation workflows — Useful when you need music or sound generation instead of speech conversion.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/chatterbox/speech-to-speech" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
audiostringYes--Source audio to convert.
reference_audiostringNo--Optional target voice reference audio.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

ParameterTypeRequiredDefaultDescription
idstringYes-Task ID

Result Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
dataobjectThe prediction data object containing all details
data.idstringUnique identifier for the prediction, the ID of the prediction to get
data.modelstringModel ID used for the prediction
data.outputsstringArray of URLs to the generated content.
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds
© 2025 WaveSpeedAI. All rights reserved.