Inputs:
- Text prompt
- Speaker tags (optional)
- Reference audio (optional, voice clone endpoint)

Limits:
- Audio output only
- English only
- Pricing is calculated per 1,000 characters
- Voice cloning requires the voice clone endpoint
- Multi-speaker dialogue uses speaker tags like [S1] and [S2]

Tips:
- Use it for dialogue, character speech, conversational scenes, and voice experiments
- Use [S1] and [S2] tags for two-speaker dialogue
- Add nonverbal cues like laughter, coughs, or throat clearing when needed
- Keep dialogue formatted like a transcript
- Use reference audio when voice consistency matters

Dia TTS