Introduction
utayomi is a text-to-speech API that converts Japanese text to natural speech using the VoiceVox engine. The API base URL is https://utayomi.com/api
Authentication
All API requests to protected endpoints require authentication using an API key. Include your API key in the Authorization header. Rate limit: 1 request/second (managed by Unkey).
Authorization: Bearer YOUR_API_KEYQuick Start
Generate speech from text with a simple POST request to https://utayomi.com/api/v1/synthesis:
curl -X POST https://api.utayomi.com/v1/synthesis \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "こんにちは", "speaker": 1}'Text Synthesis
POST /v1/synthesisConvert text to speech audio.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
text | string | Yes | The text to synthesize (max 1000 characters) |
speaker | integer | No | Speaker ID (default: 1) |
speed | number | No | Speech speed 0.5-2.0 (default: 1.0) |
Response
Returns audio data in WAV format (audio/wav).
Speakers
GET /v1/speakersList all available voice speakers.
Response
{
"speakers": [
{ "id": 1, "name": "ずんだもん", "styles": [...] },
{ "id": 2, "name": "四国めたん", "styles": [...] }
]
}Error Handling
The API returns standard HTTP status codes:
| Status | Description |
|---|---|
| 200 | Success |
| 400 | Bad Request - Invalid parameters |
| 401 | Unauthorized - Invalid or missing API key |
| 429 | Too Many Requests - Rate limit exceeded (1 request/second, managed by Unkey) |
| 500 | Internal Server Error |