What is the ChatGPT 429 Too Many Requests Error?
When your Python application hits OpenAI's API too aggressively, the server responds with HTTP status code 429 — meaning you've exceeded an enforced quota. Understanding which limit you've hit is the first step to picking the right fix.
Rate Limit Architecture Diagram
Concurrent vs. Hourly Rate Limits at a Glance
| Property | Concurrent Request Limit | Hourly / Minute Rate Limit |
|---|---|---|
| Trigger | Too many simultaneous active connections | Cumulative RPM or TPM quota exceeded |
| Reset Time | Immediately when connections complete | Every 60 seconds (rolling window) |
| Detected by | High concurrency / async calls | High request throughput over time |
| Best Fix | Semaphores / connection pooling | Exponential backoff + retry |
| API Header Hint | x-ratelimit-remaining-requests | x-ratelimit-remaining-tokens |
Why Your Python App Triggers 429 Errors
No Retry Logic
Most beginner scripts call the API in a tight loop with zero delay. A single rate-limited second causes a cascade of failures.
Unbounded Async Concurrency
asyncio.gather(*tasks) fires all requests simultaneously — a burst that instantly saturates concurrent limits.
Large Prompt Tokens
Sending huge system prompts burns your TPM quota quickly even with few requests per minute.
Low API Tier
Free and Tier 1 accounts have strict limits. Upgrading to Tier 3–5 increases both RPM and TPM by orders of magnitude.
The Gold-Standard Fix: Exponential Backoff
Exponential backoff retries a failed request after an ever-increasing pause — 1 s, 2 s, 4 s, 8 s … — giving the rate limit time to reset while not hammering the server. Both tenacity and backoff implement this as clean Python decorators.
Figure 2 — Exponential Backoff Timeline
Step 1 — Install the Library
# Install either (or both) pip install tenacity # recommended – most control pip install backoff # lightweight alternative pip install openai # ensure latest SDK
Step 2a — Exponential Backoff with tenacity
import openai from tenacity import ( retry, stop_after_attempt, wait_random_exponential, retry_if_exception_type, ) client = openai.OpenAI() # ── Retry decorator ────────────────────────────────────────────────────── @retry( wait=wait_random_exponential(min=1, max=60), # jitter prevents thundering herd stop=stop_after_attempt(6), # give up after 6 retries retry=retry_if_exception_type(openai.RateLimitError), reraise=True, ) def chat_with_retry(messages: list, model: str = "gpt-4o") -> str: """Call OpenAI Chat Completion with automatic exponential backoff.""" response = client.chat.completions.create( model=model, messages=messages, ) return response.choices[0].message.content # ── Usage ──────────────────────────────────────────────────────────────── if __name__ == "__main__": reply = chat_with_retry([ {"role": "user", "content": "Explain exponential backoff in one sentence."} ]) print(reply)
Step 2b — Alternative: backoff Library
import backoff import openai client = openai.OpenAI() @backoff.on_exception( backoff.expo, # exponential strategy openai.RateLimitError, # only catch 429 errors max_tries=8, jitter=backoff.full_jitter, # add randomness to prevent stampede ) def completions_with_backoff(**kwargs): return client.chat.completions.create(**kwargs) # ── Usage ──────────────────────────────────────────────────────────────── response = completions_with_backoff( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}], )
tenacity vs backoff — Side-by-Side
| Feature | tenacity | backoff |
|---|---|---|
| API style | Decorator + context manager | Decorator only |
| Jitter support | ✓ Built-in | ✓ Built-in |
| Stop condition | attempts / time / custom | attempts / time |
| Async support | ✓ Native | ✓ Native |
| Logging hooks | before / after / retry hooks | on_backoff / on_giveup |
| Complexity | Medium | Simple |
| Best for | Production systems | Quick scripts |
Advanced: Async Request Pooling with Semaphores
For high-throughput pipelines using asyncio, launching thousands of concurrent requests will immediately saturate the concurrent limit. The solution: bound concurrency with an asyncio.Semaphore.
import asyncio import openai from tenacity import retry, wait_random_exponential, stop_after_attempt, retry_if_exception_type aclient = openai.AsyncOpenAI() # ── Cap concurrent connections to avoid hitting concurrent limit ────────── MAX_CONCURRENT = 10 semaphore = asyncio.Semaphore(MAX_CONCURRENT) @retry( wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6), retry=retry_if_exception_type(openai.RateLimitError), ) async def bounded_chat(prompt: str) -> str: async with semaphore: # blocks when 10 in-flight resp = await aclient.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}], ) return resp.choices[0].message.content async def process_batch(prompts: list[str]) -> list[str]: tasks = [bounded_chat(p) for p in prompts] return await asyncio.gather(*tasks) # ── Run ─────────────────────────────────────────────────────────────────── if __name__ == "__main__": prompts = [f"Tell me fact number {i} about Python" for i in range(100)] results = asyncio.run(process_batch(prompts)) print(len(results), "responses received")
This approach processes 100 prompts concurrently but caps the live connections at 10 at any moment — eliminating concurrent-limit 429 errors while still maximising throughput.
For Heavy Workloads: OpenAI Batch API
For non-real-time tasks — bulk embedding generation, mass content tagging, dataset enrichment — the OpenAI Batch API is the definitive solution. Requests are processed asynchronously within 24 hours and cost 50% less than the synchronous API.
50% cheaper
vs synchronous API pricing
50,000 req/batch
per file upload limit
24-hour window
guaranteed completion SLA
import json, time, openai client = openai.OpenAI() # ── 1. Create the JSONL batch file ──────────────────────────────────────── requests = [ { "custom_id": f"req-{i}", "method": "POST", "url": "/v1/chat/completions", "body": { "model": "gpt-4o-mini", "messages": [{"role": "user", "content": f"Summarise item {i}"}], }, } for i in range(1000) # 1,000 requests — no 429 risk ] with open("batch_input.jsonl", "w") as f: for req in requests: f.write(json.dumps(req) + "\n") # ── 2. Upload the file ──────────────────────────────────────────────────── batch_file = client.files.create(file=open("batch_input.jsonl", "rb"), purpose="batch") # ── 3. Create the batch job ─────────────────────────────────────────────── batch_job = client.batches.create( input_file_id=batch_file.id, endpoint="/v1/chat/completions", completion_window="24h", ) print(f"Batch submitted: {batch_job.id}") # ── 4. Poll until complete ──────────────────────────────────────────────── while True: status = client.batches.retrieve(batch_job.id) print(f"Status: {status.status}") if status.status in ("completed", "failed", "cancelled"): break time.sleep(60) # ── 5. Download results ─────────────────────────────────────────────────── result_content = client.files.content(status.output_file_id).content with open("batch_output.jsonl", "wb") as f: f.write(result_content)
OpenAI API Tier Rate Limits (2026)
If you consistently hit 429 errors despite backoff, your API usage tier may simply be too low. Here's how the tiers compare for gpt-4o:
| Tier | RPM | TPM | RPD | Requirement |
|---|---|---|---|---|
| Free | 3 | 40,000 | 200 | New account, no billing |
| Tier 1 | 500 | 200,000 | 10,000 | $5 paid in |
| Tier 2 | 5,000 | 2,000,000 | — | $50 paid in, 7 days |
| Tier 3 | 5,000 | 4,000,000 | — | $100 paid in, 7 days |
| Tier 4 | 10,000 | 10,000,000 | — | $250 paid in, 14 days |
| Tier 5 | 10,000 | 30,000,000 | — | $1,000 paid in, 30 days |
RPM = Requests Per Minute · TPM = Tokens Per Minute · RPD = Requests Per Day. Source: OpenAI rate limit documentation.
Complete Fix Checklist
✅ ChatGPT 429 Fix Checklist
Identify which limit you're hitting
Check response headers: x-ratelimit-remaining-requests and x-ratelimit-limit-tokens
Add exponential backoff with tenacity or backoff
Use jitter to prevent thundering herd issues. Always retry on openai.RateLimitError only.
Bound async concurrency with asyncio.Semaphore
Set MAX_CONCURRENT = 10 as a starting point and tune upward.
Move bulk jobs to the Batch API
Any workload that doesn't need real-time responses is a Batch API candidate. Save 50% cost too.
Consider upgrading your API tier
If throughput requirements are structural, not occasional, Tier 3+ gives 10–100× higher limits.
Frequently Asked Questions
What causes the ChatGPT 429 Too Many Requests error in Python?
What is exponential backoff and how does it fix the 429 error?
Which Python library is best for handling OpenAI 429 errors?
When should I use the OpenAI Batch API instead?
Does upgrading my OpenAI tier permanently fix the 429 error?
More Python & API Error Guides
LogCure publishes deep-dive developer guides on Python errors, API integrations, and web performance every week.