Speechify API 2 min read

Using TTS in Node.js with Speechify

A practical guide to synthesizing speech in Node.js using the Speechify TTS API. Covers installation, a basic synthesis call, streaming audio to disk, and what to reach for next.

The Speechify TTS API gives you lifelike speech from a single API call. This is a practical walkthrough for Node.js: install the SDK, make your first synthesis call, write audio to disk, and stream it. No theory, just the code.

Install

npm install @speechify/api

You will need an API key from console.speechify.ai/api-keys. The SDK reads SPEECHIFY_API_KEY from the environment automatically, so the cleanest way to handle it locally is a .env file:

SPEECHIFY_API_KEY=your_key_here

Install dotenv to load it:

npm install dotenv

Synthesize to a file

This is the smallest useful program. It sends a string to the API and writes the result to output.mp3.

import "dotenv/config";
import fs from "node:fs";
import { SpeechifyClient } from "@speechify/api";

const client = new SpeechifyClient({ apiKey: process.env.SPEECHIFY_API_KEY });

const response = await client.audio.speech({
  input: "Hello! This is the Speechify text-to-speech API.",
  voice_id: "george",
  audio_format: "mp3",
  model: "simba-english",
});

fs.writeFileSync("output.mp3", Buffer.from(response.audio_data, "base64"));

The response comes back with audio_data as a base64-encoded string. Buffer.from(..., "base64") decodes it before writing. That is the whole thing.

Stream audio to disk

For longer inputs, you do not want to wait for the full payload before you can use the audio. client.audio.stream gives you a BinaryResponse you can pipe to disk as it arrives, which keeps time-to-first-byte low.

import "dotenv/config";
import { createWriteStream } from "node:fs";
import { Readable } from "node:stream";
import { pipeline } from "node:stream/promises";
import { SpeechifyClient } from "@speechify/api";

const client = new SpeechifyClient({ apiKey: process.env.SPEECHIFY_API_KEY });

const response = await client.audio.stream({
  Accept: "audio/mpeg",
  input: "Streaming lets you start playing audio before the whole clip is ready.",
  voice_id: "george",
  model: "simba-english",
});

const body = response.stream();
if (!body) throw new Error("Streaming response has no body.");

await pipeline(Readable.fromWeb(body), createWriteStream("output.mp3"));

response.stream() returns a Web ReadableStream<Uint8Array>. Readable.fromWeb converts it to a Node stream so you can pipe it with pipeline, which handles backpressure and errors cleanly.

What to reach for next

The Speechify API Cookbook has runnable TypeScript recipes for everything covered here and more: SSML controls for pitch, rate, and emotion; word-level timestamps for caption sync; and voice cloning. Each recipe is a folder you copy, an .env you populate, and a pnpm start. The full reference is at docs.speechify.ai.