Retrying Failed Requests with the tenacity Library
Transient failures are a fact of life at scale, and this guide — part of Asynchronous Scraping with asyncio and httpx — shows how the tenacity library turns flaky requests into reliable ones with backoff, jitter, and sane caps.
tenacity wraps any function in a configurable retry policy so you do not hand-roll loops and sleep calls. For scraping, the durable recipe is exponential backoff with jitter, a cap on the maximum wait, a bounded number of attempts, and a predicate that retries on network exceptions and specific HTTP status codes such as 429, 500, 502, 503, and 504. That combination recovers from blips and rate limits without hammering a struggling server or looping forever.
Why Naive Retries Make Things Worse
The instinct to while True with a fixed time.sleep(1) fails in two directions. A fixed short delay retries too aggressively, adding load to a server that is already returning 503s — the opposite of what you want. A fixed long delay wastes time on failures that would clear in milliseconds. Neither adapts to the failure, and neither has a stopping condition, so a permanently broken endpoint becomes an infinite loop.
Exponential backoff fixes the timing: wait 1s, then 2s, then 4s, then 8s, growing the gap so a recovering server gets breathing room. But pure exponential backoff creates a second problem — the thundering herd. If a hundred workers all hit a 503 at the same instant, they all back off by the same amounts and retry in perfect synchronization, producing coordinated traffic spikes. Jitter — a random offset added to each wait — desynchronizes the workers so their retries spread out over time. Finally, a maximum wait cap stops the exponential curve from ballooning to minutes, and a stop condition (max attempts or max elapsed time) guarantees the operation eventually gives up. These same politeness concerns underpin How to Scrape a Static Website Without Getting Blocked.
Building a Retry Policy for HTTP Scraping
tenacity composes a policy from small, readable pieces: stop decides when to give up, wait decides how long to pause, and retry decides which outcomes are retryable. The example below retries on connection errors and on a set of retryable status codes, using exponential backoff bounded by a cap. It raises a custom exception on bad statuses so tenacity treats them as retryable failures rather than successful returns.
import httpx
from tenacity import (
retry,
stop_after_attempt,
wait_exponential_jitter,
retry_if_exception_type,
before_sleep_log,
)
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("scraper")
HEADERS = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0 Safari/537.36"}
RETRYABLE_STATUS = {429, 500, 502, 503, 504}
class RetryableStatusError(Exception):
"""Raised for HTTP statuses that are worth retrying."""
@retry(
stop=stop_after_attempt(5),
wait=wait_exponential_jitter(initial=1, max=30),
retry=retry_if_exception_type((httpx.TransportError, RetryableStatusError)),
before_sleep=before_sleep_log(logger, logging.WARNING),
reraise=True,
)
def fetch(client: httpx.Client, url: str) -> httpx.Response:
resp = client.get(url, headers=HEADERS, timeout=15.0)
if resp.status_code in RETRYABLE_STATUS:
raise RetryableStatusError(f"{resp.status_code} for {url}")
resp.raise_for_status()
return resp
if __name__ == "__main__":
with httpx.Client() as client:
page = fetch(client, "https://httpbin.org/status/503")
print(page.status_code, len(page.text))
wait_exponential_jitter bakes in both the exponential growth and the random jitter, with max acting as the cap. stop_after_attempt(5) guarantees the call gives up after five tries, and reraise=True surfaces the original exception instead of tenacity's wrapper so your calling code sees a real httpx error.
Retrying Async Requests and Honoring Retry-After
tenacity supports coroutines transparently: decorate an async def and it awaits the retries for you, which fits the async patterns from the complete guide to Python web scraping. A polite scraper should also respect a server's Retry-After header on 429 responses rather than guessing. You can read that header and feed it into a wait strategy, or handle it explicitly before deferring to backoff.
import asyncio
import httpx
from tenacity import (
retry,
stop_after_attempt,
wait_exponential_jitter,
retry_if_exception_type,
)
HEADERS = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15"}
class RateLimited(Exception):
def __init__(self, retry_after: float) -> None:
self.retry_after = retry_after
super().__init__(f"rate limited, retry after {retry_after}s")
@retry(
stop=stop_after_attempt(4),
wait=wait_exponential_jitter(initial=2, max=60),
retry=retry_if_exception_type((httpx.TransportError, RateLimited)),
reraise=True,
)
async def fetch(client: httpx.AsyncClient, url: str) -> httpx.Response:
resp = await client.get(url, headers=HEADERS, timeout=15.0)
if resp.status_code == 429:
wait_for = float(resp.headers.get("Retry-After", "5"))
await asyncio.sleep(wait_for)
raise RateLimited(wait_for)
resp.raise_for_status()
return resp
async def main() -> None:
async with httpx.AsyncClient(http2=True) as client:
resp = await fetch(client, "https://httpbin.org/get")
print("ok", resp.status_code)
if __name__ == "__main__":
asyncio.run(main())
Pairing tenacity with a rotating pool of exit IPs makes the whole system far more resilient; see Rotating Proxies and Managing IP Blocks for that layer, and lean on tenacity to smooth over the inevitable transient errors a distributed crawl produces.
Edge Cases and Caveats
- Do not retry non-idempotent writes blindly. Retrying a POST that already succeeded but timed out on the response can create duplicates. Guard writes with idempotency keys before wrapping them in retries.
- Never retry a 4xx you caused. A 400, 401, 403, or 404 will not fix itself. Only retry transient statuses (429, 5xx) and network errors; retrying a hard client error just wastes attempts.
- Cap total time, not just attempts. Combine
stop_after_attemptwithstop_after_delayso a slow endpoint cannot make one request block for minutes even within the attempt budget. - Jitter is not optional at scale. Without jitter, many workers synchronize their retries and recreate the spike that caused the failure. Always use a jittered wait strategy in a concurrent crawl.
- Log before sleeping. Use
before_sleep_logso you can see how often retries fire; a sudden rise usually signals an IP block or rate limit, not random noise. reraise=Truematters for debugging. Without it, failures surface asRetryError, hiding the underlyinghttpxexception and making root-cause analysis harder.
Frequently Asked Questions
What is the difference between exponential backoff and jitter? Exponential backoff grows the wait between attempts (1s, 2s, 4s, 8s) so a struggling server gets more room each time. Jitter adds a random offset to each of those waits so many concurrent workers do not retry in lockstep. You want both: backoff for adaptivity, jitter to avoid synchronized traffic spikes.
Which HTTP status codes should I retry? Retry transient ones — 429 (rate limited) and 500, 502, 503, 504 (server-side failures) — plus network-level exceptions like timeouts and connection resets. Do not retry 400, 401, 403, or 404, because those reflect a problem with the request itself that a retry cannot fix.
Does tenacity work with async functions?
Yes. Decorate an async def with @retry and tenacity awaits the retries for you, sleeping asynchronously between attempts so the event loop stays free. The policy syntax is identical to the synchronous case.
How do I stop retries from looping forever?
Always supply a stop condition. Use stop_after_attempt(n) to cap the number of tries, or combine it with stop_after_delay(seconds) to also cap total elapsed time. Without a stop condition a permanently broken endpoint will retry indefinitely.