OpenAI API: The Complete Developer Guide (GPT-5.5, Images & Audio)

From your first API key to production agents — everything you need to ship with OpenAI's current models.

Jan 8, 2025

The OpenAI API is still the fastest path from idea to shipped AI feature. This guide takes you from your first key to production-grade usage of GPT-5.5, the Responses API, image generation and transcription — with copy-paste code that works today.

OpenAI API: The Complete Developer Guide (GPT-5.5, Images & Audio)

If you want the shortest path from "we should add AI" to "it's live," the OpenAI API is still it. The ecosystem, the docs, and the model quality mean you spend your time building features — not fighting infrastructure.

This guide takes you from your first API key to production usage of the current model lineup: GPT-5.5, the Responses API, function calling, streaming, image generation, and transcription.

Getting started

Step 1: Get your API key

Sign up at OpenAI's platform, open the API keys section, and create a secret key. Store it as an environment variable — you will not see it again.

export OPENAI_API_KEY="sk-your-api-key-here"

Step 2: Install the SDK

pip install openai      # Python
npm install openai      # Node.js

Step 3: Your first request (Responses API)

The Responses API is OpenAI's current recommended interface — it unifies chat, tools, and state. The classic Chat Completions API still works, but new projects should start here.

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5.5",
    input="Explain the difference between embeddings and fine-tuning in two sentences."
)

print(response.output_text)

The current model lineup

GPT-5.5 — the flagship

GPT-5.5 is OpenAI's most capable model: top-tier reasoning, a 1M-token context window, and strong multimodal (text + image) understanding. Pricing is roughly $5 / $30 per million input/output tokens, with cached input far cheaper.

response = client.responses.create(
    model="gpt-5.5",
    input=[
        {"role": "system", "content": "You are a precise technical assistant."},
        {"role": "user", "content": "Review this SQL for injection risks: ..."}
    ]
)

GPT-5.4 and GPT-5.4-nano — speed and cost

For high-volume or latency-sensitive work, drop down a tier. GPT-5.4-nano costs roughly $0.20 / $1.25 per million tokens — ideal for classification, extraction, routing, and tasks that don't need flagship reasoning.

Rule of thumb: start every feature on the smallest model that passes your eval, then upgrade only where quality demands it.

GPT-5.3-Codex — agentic coding

For long-horizon coding agents, the dedicated Codex model (around $1.75 / $14 per million tokens) is tuned for multi-step software tasks.

Multimodal: vision and images

Image understanding

response = client.responses.create(
    model="gpt-5.5",
    input=[{
        "role": "user",
        "content": [
            {"type": "input_text", "text": "What's wrong with this UI screenshot?"},
            {"type": "input_image", "image_url": "https://example.com/screen.png"}
        ]
    }]
)

Image generation with gpt-image-1

DALL·E has been superseded by the gpt-image family. gpt-image-1 produces high-fidelity images with excellent prompt adherence and text rendering — around $0.04 per standard image (a mini variant runs ~$0.005).

img = client.images.generate(
    model="gpt-image-1",
    prompt="A clean isometric illustration of a REST API gateway, soft studio lighting",
    size="1024x1024"
)

Audio: transcription

whisper-1 has been replaced by the gpt-4o-transcribe family. gpt-4o-mini-transcribe (~$0.003/min) is the recommended default for accuracy and cost.

with open("call.mp3", "rb") as audio:
    transcript = client.audio.transcriptions.create(
        model="gpt-4o-mini-transcribe",
        file=audio
    )
print(transcript.text)

Advanced features you'll actually use

Function (tool) calling

tools = [{
    "type": "function",
    "name": "get_weather",
    "description": "Get current weather for a city",
    "parameters": {
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"]
    }
}]

response = client.responses.create(
    model="gpt-5.5",
    input="What's the weather in Lisbon?",
    tools=tools
)

Streaming

stream = client.responses.create(
    model="gpt-5.5",
    input="Write a haiku about latency.",
    stream=True
)
for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="")

Structured output

Force valid JSON that matches your schema — no more brittle parsing of free text. Combine structured output with function calling to build reliable agents.

Embeddings

emb = client.embeddings.create(
    model="text-embedding-3-small",
    input="The quick brown fox"
)
vector = emb.data[0].embedding

Pricing snapshot

Model Input / 1M Output / 1M Use it for
GPT-5.5 ~$5.00 ~$30.00 Hard reasoning, agents
GPT-5.4-nano ~$0.20 ~$1.25 High-volume, simple tasks
GPT-5.3-Codex ~$1.75 ~$14.00 Coding agents
gpt-image-1 ~$0.04/image Image generation
gpt-4o-mini-transcribe ~$0.003/min Transcription

Prices move — confirm current rates in OpenAI's pricing docs before you budget.

Best practices

  1. Default to the smallest model that passes your eval; upgrade selectively.
  2. Use cached input for repeated system prompts — it slashes cost.
  3. Stream chat UIs for perceived speed.
  4. Retry with backoff on rate limits.
  5. Cap output length to control cost and latency.

Next steps

See how OpenAI stacks up against the field in our text generation API comparison, or explore alternatives like Anthropic Claude and Google Gemini in the directory.