LogCure.com
Python API June 3, 2026 12 min read

Fix ChatGPT 429 Too Many Requests API Error in Python

Stop your Python app from crashing. Implement exponential backoff, request pooling, and the OpenAI Batch API — with copy-paste code you can deploy in minutes.

API Rate Limit & Concurrency Simulator

Adjust the sliders below to see how sudden traffic spikes trigger 429 errors.

Simulates your app traffic (includes random ±20% jitter).

The maximum concurrency threshold set by the API.

Total Processed: 0
Accepted (200 OK): 0
Rejected (429 Error): 0
Successful Request
429 Blocked Request
API Limit Threshold

TL;DR – Quick Fix

Wrap your OpenAI calls with @retry from the tenacity library using exponential backoff. Scroll to the code block to copy it now.

What is the ChatGPT 429 Too Many Requests Error?

When your Python application hits OpenAI's API too aggressively, the server responds with HTTP status code 429 — meaning you've exceeded an enforced quota. Understanding which limit you've hit is the first step to picking the right fix.

 

Rate Limit Architecture Diagram

Concurrent Request Limit Too many active connections at the SAME INSTANT Req 1 Req 2 Req 3 Req 4 API Server 429 ✗ Limit: e.g. 500 RPM on Tier 1 Hourly / Minute Rate Limit Total tokens/requests exceeds quota over TIME 90% TPM Used SPIKE 429 ✗ 0 Limit ⏱️ RPM = Requests Per Minute TPM = Tokens Per Minute Resets automatically every 60 seconds
Figure 1 — Two distinct OpenAI rate limit mechanisms that both return HTTP 429

Concurrent vs. Hourly Rate Limits at a Glance

Property Concurrent Request Limit Hourly / Minute Rate Limit
Trigger Too many simultaneous active connections Cumulative RPM or TPM quota exceeded
Reset Time Immediately when connections complete Every 60 seconds (rolling window)
Detected by High concurrency / async calls High request throughput over time
Best Fix Semaphores / connection pooling Exponential backoff + retry
API Header Hint x-ratelimit-remaining-requests x-ratelimit-remaining-tokens

Why Your Python App Triggers 429 Errors

🔁

No Retry Logic

Most beginner scripts call the API in a tight loop with zero delay. A single rate-limited second causes a cascade of failures.

Unbounded Async Concurrency

asyncio.gather(*tasks) fires all requests simultaneously — a burst that instantly saturates concurrent limits.

📦

Large Prompt Tokens

Sending huge system prompts burns your TPM quota quickly even with few requests per minute.

🎚️

Low API Tier

Free and Tier 1 accounts have strict limits. Upgrading to Tier 3–5 increases both RPM and TPM by orders of magnitude.

The Gold-Standard Fix: Exponential Backoff

Exponential backoff retries a failed request after an ever-increasing pause — 1 s, 2 s, 4 s, 8 s … — giving the rate limit time to reset while not hammering the server. Both tenacity and backoff implement this as clean Python decorators.

Figure 2 — Exponential Backoff Timeline

1 Fail wait 1s 2 Fail wait 2s 3 Fail wait 4s 4 Fail wait 8s + jitter Success! 0s ~15s

Step 1 — Install the Library

# Install either (or both)
pip install tenacity      # recommended – most control
pip install backoff       # lightweight alternative
pip install openai        # ensure latest SDK

Step 2a — Exponential Backoff with tenacity

openai_tenacity.py tenacity
import openai
from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential,
    retry_if_exception_type,
)

client = openai.OpenAI()

# ── Retry decorator ──────────────────────────────────────────────────────
@retry(
    wait=wait_random_exponential(min=1, max=60),  # jitter prevents thundering herd
    stop=stop_after_attempt(6),               # give up after 6 retries
    retry=retry_if_exception_type(openai.RateLimitError),
    reraise=True,
)
def chat_with_retry(messages: list, model: str = "gpt-4o") -> str:
    """Call OpenAI Chat Completion with automatic exponential backoff."""
    response = client.chat.completions.create(
        model=model,
        messages=messages,
    )
    return response.choices[0].message.content

# ── Usage ────────────────────────────────────────────────────────────────
if __name__ == "__main__":
    reply = chat_with_retry([
        {"role": "user", "content": "Explain exponential backoff in one sentence."}
    ])
    print(reply)

Step 2b — Alternative: backoff Library

openai_backoff.py backoff
import backoff
import openai

client = openai.OpenAI()

@backoff.on_exception(
    backoff.expo,                   # exponential strategy
    openai.RateLimitError,           # only catch 429 errors
    max_tries=8,
    jitter=backoff.full_jitter,     # add randomness to prevent stampede
)
def completions_with_backoff(**kwargs):
    return client.chat.completions.create(**kwargs)

# ── Usage ────────────────────────────────────────────────────────────────
response = completions_with_backoff(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)

tenacity vs backoff — Side-by-Side

Feature tenacity backoff
API styleDecorator + context managerDecorator only
Jitter support✓ Built-in✓ Built-in
Stop conditionattempts / time / customattempts / time
Async support✓ Native✓ Native
Logging hooksbefore / after / retry hookson_backoff / on_giveup
ComplexityMediumSimple
Best forProduction systemsQuick scripts

Advanced: Async Request Pooling with Semaphores

For high-throughput pipelines using asyncio, launching thousands of concurrent requests will immediately saturate the concurrent limit. The solution: bound concurrency with an asyncio.Semaphore.

async_pool.py asyncio + semaphore
import asyncio
import openai
from tenacity import retry, wait_random_exponential, stop_after_attempt, retry_if_exception_type

aclient = openai.AsyncOpenAI()

# ── Cap concurrent connections to avoid hitting concurrent limit ──────────
MAX_CONCURRENT = 10
semaphore = asyncio.Semaphore(MAX_CONCURRENT)

@retry(
    wait=wait_random_exponential(min=1, max=60),
    stop=stop_after_attempt(6),
    retry=retry_if_exception_type(openai.RateLimitError),
)
async def bounded_chat(prompt: str) -> str:
    async with semaphore:            # blocks when 10 in-flight
        resp = await aclient.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}],
        )
        return resp.choices[0].message.content

async def process_batch(prompts: list[str]) -> list[str]:
    tasks = [bounded_chat(p) for p in prompts]
    return await asyncio.gather(*tasks)

# ── Run ───────────────────────────────────────────────────────────────────
if __name__ == "__main__":
    prompts = [f"Tell me fact number {i} about Python" for i in range(100)]
    results = asyncio.run(process_batch(prompts))
    print(len(results), "responses received")

This approach processes 100 prompts concurrently but caps the live connections at 10 at any moment — eliminating concurrent-limit 429 errors while still maximising throughput.

For Heavy Workloads: OpenAI Batch API

For non-real-time tasks — bulk embedding generation, mass content tagging, dataset enrichment — the OpenAI Batch API is the definitive solution. Requests are processed asynchronously within 24 hours and cost 50% less than the synchronous API.

💰

50% cheaper

vs synchronous API pricing

📋

50,000 req/batch

per file upload limit

24-hour window

guaranteed completion SLA

batch_api_example.py Batch API
import json, time, openai

client = openai.OpenAI()

# ── 1. Create the JSONL batch file ────────────────────────────────────────
requests = [
    {
        "custom_id": f"req-{i}",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-4o-mini",
            "messages": [{"role": "user", "content": f"Summarise item {i}"}],
        },
    }
    for i in range(1000)   # 1,000 requests — no 429 risk
]

with open("batch_input.jsonl", "w") as f:
    for req in requests:
        f.write(json.dumps(req) + "\n")

# ── 2. Upload the file ────────────────────────────────────────────────────
batch_file = client.files.create(file=open("batch_input.jsonl", "rb"), purpose="batch")

# ── 3. Create the batch job ───────────────────────────────────────────────
batch_job = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
)
print(f"Batch submitted: {batch_job.id}")

# ── 4. Poll until complete ────────────────────────────────────────────────
while True:
    status = client.batches.retrieve(batch_job.id)
    print(f"Status: {status.status}")
    if status.status in ("completed", "failed", "cancelled"):
        break
    time.sleep(60)

# ── 5. Download results ───────────────────────────────────────────────────
result_content = client.files.content(status.output_file_id).content
with open("batch_output.jsonl", "wb") as f:
    f.write(result_content)

OpenAI API Tier Rate Limits (2026)

If you consistently hit 429 errors despite backoff, your API usage tier may simply be too low. Here's how the tiers compare for gpt-4o:

Tier RPM TPM RPD Requirement
Free 3 40,000 200 New account, no billing
Tier 1 500 200,000 10,000 $5 paid in
Tier 2 5,000 2,000,000 $50 paid in, 7 days
Tier 3 5,000 4,000,000 $100 paid in, 7 days
Tier 4 10,000 10,000,000 $250 paid in, 14 days
Tier 5 10,000 30,000,000 $1,000 paid in, 30 days

RPM = Requests Per Minute · TPM = Tokens Per Minute · RPD = Requests Per Day. Source: OpenAI rate limit documentation.

Complete Fix Checklist

✅ ChatGPT 429 Fix Checklist

1

Identify which limit you're hitting

Check response headers: x-ratelimit-remaining-requests and x-ratelimit-limit-tokens

2

Add exponential backoff with tenacity or backoff

Use jitter to prevent thundering herd issues. Always retry on openai.RateLimitError only.

3

Bound async concurrency with asyncio.Semaphore

Set MAX_CONCURRENT = 10 as a starting point and tune upward.

4

Move bulk jobs to the Batch API

Any workload that doesn't need real-time responses is a Batch API candidate. Save 50% cost too.

5

Consider upgrading your API tier

If throughput requirements are structural, not occasional, Tier 3+ gives 10–100× higher limits.

Frequently Asked Questions

What causes the ChatGPT 429 Too Many Requests error in Python?
The 429 error fires when you exceed either OpenAI's concurrent request limit (too many simultaneous active connections at the same instant) or the rolling rate limit (total RPM or TPM quota assigned to your API tier). The former resets as soon as connections complete; the latter resets every 60 seconds.
What is exponential backoff and how does it fix the 429 error?
Exponential backoff automatically retries a failed request after an increasing wait — typically 1 s, 2 s, 4 s, 8 s — until success or a maximum attempt count. Adding jitter (a small random offset) prevents multiple clients from retrying simultaneously (thundering herd problem). Libraries like tenacity and backoff implement this pattern as clean Python decorators.
Which Python library is best for handling OpenAI 429 errors?
For production systems, tenacity is recommended — it provides fine-grained control over stop conditions, wait strategies, retry predicates, and hooks for logging. For quick scripts, backoff is simpler to set up with a one-line decorator. Both libraries support async and have first-class jitter.
When should I use the OpenAI Batch API instead?
Use the Batch API for any heavy workload that doesn't require a real-time response: dataset enrichment, embedding generation at scale, bulk content processing, or background classification jobs. Batched requests run within 24 hours, bypass standard rate limits, and are billed at 50% of the synchronous API price.
Does upgrading my OpenAI tier permanently fix the 429 error?
Upgrading raises your RPM and TPM ceilings significantly — Tier 5 offers 30M TPM versus Free tier's 40K. However, exponential backoff should always remain in your code as a resilience layer; traffic spikes, OpenAI service degradations, or future workload growth can still push you to any tier's ceiling.

More Python & API Error Guides

LogCure publishes deep-dive developer guides on Python errors, API integrations, and web performance every week.