Introduction

utayomi is a text-to-speech API that converts Japanese text to natural speech using the VoiceVox engine. The API base URL is https://utayomi.com/api

Authentication

All API requests to protected endpoints require authentication using an API key. Include your API key in the Authorization header. Rate limit: 1 request/second (managed by Unkey).

Authorization: Bearer YOUR_API_KEY

Quick Start

Generate speech from text with a simple POST request to https://utayomi.com/api/v1/synthesis:

curl -X POST https://api.utayomi.com/v1/synthesis \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "こんにちは", "speaker": 1}'

Text Synthesis

POST /v1/synthesis

Convert text to speech audio.

Request Body

Parameter	Type	Required	Description
`text`	string	Yes	The text to synthesize (max 1000 characters)
`speaker`	integer	No	Speaker ID (default: 1)
`speed`	number	No	Speech speed 0.5-2.0 (default: 1.0)

Response

Returns audio data in WAV format (audio/wav).

Speakers

GET /v1/speakers

List all available voice speakers.

Response

{
  "speakers": [
    { "id": 1, "name": "ずんだもん", "styles": [...] },
    { "id": 2, "name": "四国めたん", "styles": [...] }
  ]
}

Error Handling

The API returns standard HTTP status codes:

Status	Description
200	Success
400	Bad Request - Invalid parameters
401	Unauthorized - Invalid or missing API key
429	Too Many Requests - Rate limit exceeded (1 request/second, managed by Unkey)
500	Internal Server Error