Introduction

utayomi is a text-to-speech API that converts Japanese text to natural speech using the VoiceVox engine. The API base URL is https://utayomi.com/api

Authentication

All API requests to protected endpoints require authentication using an API key. Include your API key in the Authorization header. Rate limit: 1 request/second (managed by Unkey).

Authorization: Bearer YOUR_API_KEY

Quick Start

Generate speech from text with a simple POST request to https://utayomi.com/api/v1/synthesis:

curl -X POST https://api.utayomi.com/v1/synthesis \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "こんにちは", "speaker": 1}'

Text Synthesis

POST /v1/synthesis

Convert text to speech audio.

Request Body

ParameterTypeRequiredDescription
textstringYesThe text to synthesize (max 1000 characters)
speakerintegerNoSpeaker ID (default: 1)
speednumberNoSpeech speed 0.5-2.0 (default: 1.0)

Response

Returns audio data in WAV format (audio/wav).

Speakers

GET /v1/speakers

List all available voice speakers.

Response

{
  "speakers": [
    { "id": 1, "name": "ずんだもん", "styles": [...] },
    { "id": 2, "name": "四国めたん", "styles": [...] }
  ]
}

Error Handling

The API returns standard HTTP status codes:

StatusDescription
200Success
400Bad Request - Invalid parameters
401Unauthorized - Invalid or missing API key
429Too Many Requests - Rate limit exceeded (1 request/second, managed by Unkey)
500Internal Server Error