AI Voice Review
Guide11 min read

ElevenLabs API Guide: Add AI Voice to Your App in Under an Hour

By VoiceToolsReview Editorial Team

Last updated:

Affiliate link — we may earn a small commission.

Get Your ElevenLabs API Key

The free plan includes API access after upgrading to Starter ($5/month). Sign up, generate your key, and have your first audio file in minutes.

The ElevenLabs API turns any application into a text-to-speech engine with near-human voice quality. This guide covers everything you need to make your first API call, handle streaming audio, select voices programmatically, and think about what the API costs at production scale.

Prerequisites

  • An ElevenLabs account (free to create)
  • API access requires the Starter plan or above ($5/month). The free tier does not include API access.
  • Basic familiarity with HTTP requests or a Python/Node.js environment
Sign up for ElevenLabs — Starter plan includes API access

Step 1: Get Your API Key

Once logged into ElevenLabs on a Starter plan or above:

  1. Click your profile icon in the top-right corner
  2. Select Profile + API Key
  3. Your API key is displayed in the API Key section — click Copy to grab it

Store this key in an environment variable — never hardcode it in source files you'll commit to version control.

export ELEVENLABS_API_KEY="your_key_here"

Step 2: Make Your First API Call

The core endpoint is /v1/text-to-speech/{voice_id}. It accepts a JSON body with your text and voice settings, and returns an audio file.

Python

import os
import requests

api_key = os.environ["ELEVENLABS_API_KEY"]
voice_id = "21m00Tcm4TlvDq8ikWAM"  # Rachel — a reliable default

url = f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}"

headers = {
    "xi-api-key": api_key,
    "Content-Type": "application/json",
}

payload = {
    "text": "Hello from ElevenLabs. This is your first API-generated audio file.",
    "model_id": "eleven_multilingual_v2",
    "voice_settings": {
        "stability": 0.5,
        "similarity_boost": 0.75,
    },
}

response = requests.post(url, headers=headers, json=payload)

with open("output.mp3", "wb") as f:
    f.write(response.content)

print("Audio saved to output.mp3")

Node.js / TypeScript

import fs from "fs";

const API_KEY = process.env.ELEVENLABS_API_KEY!;
const VOICE_ID = "21m00Tcm4TlvDq8ikWAM"; // Rachel

const response = await fetch(
  `https://api.elevenlabs.io/v1/text-to-speech/${VOICE_ID}`,
  {
    method: "POST",
    headers: {
      "xi-api-key": API_KEY,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      text: "Hello from ElevenLabs. This is your first API-generated audio file.",
      model_id: "eleven_multilingual_v2",
      voice_settings: { stability: 0.5, similarity_boost: 0.75 },
    }),
  }
);

const buffer = await response.arrayBuffer();
fs.writeFileSync("output.mp3", Buffer.from(buffer));
console.log("Audio saved to output.mp3");

Using the Official SDK

ElevenLabs also provides official SDKs that wrap the REST API:

pip install elevenlabs          # Python
npm install @elevenlabs/api     # Node.js

The SDK approach handles error handling and retry logic for you — a better choice for production integrations than raw HTTP requests.

from elevenlabs import ElevenLabs

client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])

audio = client.text_to_speech.convert(
    text="This is generated via the ElevenLabs SDK.",
    voice_id="21m00Tcm4TlvDq8ikWAM",
    model_id="eleven_multilingual_v2",
)

with open("output.mp3", "wb") as f:
    for chunk in audio:
        f.write(chunk)

Step 3: Find the Right Voice ID

The voice_id parameter determines which voice generates your audio. There are two categories:

Pre-made voices — ElevenLabs' library of 1,000+ voices you can use without any setup. Common options:

Voice nameVoice IDCharacter
Rachel21m00Tcm4TlvDq8ikWAMWarm, natural American English
JoshTxGEqnHWrfWFTfGW9XjXAuthoritative, documentary-style
CharlotteXB0fDUnXU5powFXDhCwaProfessional British English

Custom voices — cloned voices and voices you've added to your account. Retrieve these IDs dynamically:

response = requests.get(
    "https://api.elevenlabs.io/v1/voices",
    headers={"xi-api-key": api_key},
)
voices = response.json()["voices"]
for v in voices:
    print(v["name"], "—", v["voice_id"])
Always retrieve voice IDs dynamically in production

Hardcoded voice IDs can break if ElevenLabs retires a voice. Fetch the voice list at startup or cache it with a TTL, and look up the voice by name rather than hardcoding the ID.

Step 4: Choose the Right Model

The model_id parameter selects which ElevenLabs generation model runs your request. The main options:

ModelSpeedQualityBest for
eleven_multilingual_v2StandardHighestContent pipelines, narration
eleven_turbo_v2_5FastHighReal-time, low-latency use cases
eleven_flash_v2_5FastestGoodInteractive conversational apps

For content creation pipelines where you are pre-generating files and latency is not critical, eleven_multilingual_v2 gives the best output quality. For anything that must respond in real time — chatbots, voice assistants — eleven_turbo_v2_5 or eleven_flash_v2_5 is the right choice.

Step 5: Streaming Audio for Real-Time Applications

The standard endpoint generates the complete audio file before returning. For interactive applications, this delay is noticeable. The streaming endpoint delivers audio chunks as they are generated, allowing your application to begin playback within 1–2 seconds of sending the request.

url = f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}/stream"

with requests.post(url, headers=headers, json=payload, stream=True) as response:
    with open("output_streamed.mp3", "wb") as f:
        for chunk in response.iter_content(chunk_size=1024):
            f.write(chunk)

In a real-time application, instead of writing chunks to a file, you would send them to an audio playback buffer — the exact implementation depends on your framework and platform.

Streaming and Turbo models pair well

Combine eleven_turbo_v2_5 with the streaming endpoint for the lowest time-to-first-audio. In internal testing, this combination typically produces audible output within 800–1,200ms of the API call — acceptable for most conversational applications.

Step 6: Voice Settings Reference

Three parameters control how the selected voice behaves:

stability (0.0–1.0) Controls output consistency across regenerations.

  • 0.3–0.5 — more expressive, more variation between generations
  • 0.6–0.8 — more consistent, more predictable output
  • For batch content pipelines, higher stability reduces variation between audio segments

similarity_boost (0.0–1.0) How closely the output adheres to the source voice characteristics.

  • Keep above 0.7 for pre-made voices to avoid quality degradation
  • Higher values help maintain cloned voice consistency

style (0.0–1.0) Amplifies the expressive character of the voice. Defaults to 0.0 — leave it there unless your content specifically benefits from heightened expression.

Try ElevenLabs free — API access on Starter plan

Output Formats

By default the API returns MP3 at 128kbps. You can specify a higher output quality using the output_format parameter:

payload = {
    "text": "Your script here.",
    "model_id": "eleven_multilingual_v2",
    "output_format": "mp3_44100_192",  # 192kbps, 44.1kHz
    "voice_settings": { "stability": 0.5, "similarity_boost": 0.75 },
}

Available formats include mp3_22050_32, mp3_44100_128, mp3_44100_192, and pcm_16000 / pcm_44100 for raw audio suitable for real-time pipelines.

Rate Limits and Production Considerations

Rate limits apply per API key and vary by plan:

PlanRequests per minute (approx.)
Starter~2 concurrent requests
Creator~5 concurrent requests
Pro~15 concurrent requests
Scale~50 concurrent requests
Characters are shared across web and API usage

API requests draw from the same monthly character allowance as the web interface. If you generate audio through the web UI and via API in the same month, they consume from the same pool. Size your plan based on total character consumption across both sources.

For production content pipelines generating large volumes of audio:

  • Build retry logic with exponential backoff for 429 (rate limit) responses
  • Batch requests with a delay between them rather than firing concurrently
  • Cache generated audio files where content is static — don't regenerate unchanged text
  • The Scale plan at $330/month (2M characters) is where serious API builders typically land

Common API Use Cases

Content automation pipeline — Generate article narrations nightly using eleven_multilingual_v2. Store MP3s in S3 or equivalent. Serve from CDN. Regenerate only when source text changes.

Chatbot voice responses — Use the streaming endpoint with eleven_turbo_v2_5. Send response text as it arrives from your LLM, stream audio back to the client. Build in a short text buffer before sending to the TTS API to avoid very short audio chunks.

Audiobook production tool — Use the /v1/text-to-speech/{voice_id} endpoint for each chapter section, stitch output files together. ElevenLabs Projects (web UI) may be more practical for human-in-the-loop production workflows.

Accessibility reader — Generate on-demand audio for page content. Cache aggressively. The Starter plan ($5/month) is sufficient for small-scale deployments; size up based on measured character volume.

Summary

The ElevenLabs API is well-designed and genuinely easy to integrate. From sign-up to first working call is under an hour. The main production considerations are character budget planning, appropriate model selection for your latency requirements, and caching to avoid regenerating unchanged content.

For voice quality in API output, ElevenLabs leads the field. For raw speed in conversational applications, compare it against PlayHT's real-time endpoint before committing — PlayHT has invested specifically in first-chunk latency and can outperform ElevenLabs' turbo models in some low-latency benchmarks.

Tested April 2026 on a Creator account. API endpoints and rate limits correct at time of writing — check the ElevenLabs developer documentation for the current specification.

Free: AI Voice Tool Comparison Guide

Which tool wins for your use case, ElevenLabs pricing decoded, and a quick-reference comparison table — sent straight to your inbox. No spam. Unsubscribe anytime.

Get Your ElevenLabs API Key

The free plan includes API access after upgrading to Starter ($5/month). Sign up, generate your key, and have your first audio file in minutes.

Frequently Asked Questions

Related Articles

Last updated: