Reading layout

Retrying Failed Requests with the tenacity Library

Transient failures are a fact of life at scale, and this guide — part of Asynchronous Scraping with asyncio and httpx — shows how the tenacity library turns flaky requests into reliable ones with backoff, jitter, and sane caps.

Exponential backoff with jitter versus fixed retries Two retry timelines. Fixed-interval retries fire at equal short gaps. Exponential backoff with jitter grows the gap between attempts and stops at a maximum cap after a bounded number of tries. time →attempt 1Fixed intervalequal short gaps keep hammering the serverExponential + jitter1s~2s~4s~8s (cap)max wait cap
Exponential backoff with jitter widens the gap between retries and caps it; fixed-interval retries stay flat and hammer a struggling server.

tenacity wraps any function in a configurable retry policy so you do not hand-roll loops and sleep calls. For scraping, the durable recipe is exponential backoff with jitter, a cap on the maximum wait, a bounded number of attempts, and a predicate that retries on network exceptions and specific HTTP status codes such as 429, 500, 502, 503, and 504. That combination recovers from blips and rate limits without hammering a struggling server or looping forever.

Why Naive Retries Make Things Worse

The instinct to while True with a fixed time.sleep(1) fails in two directions. A fixed short delay retries too aggressively, adding load to a server that is already returning 503s — the opposite of what you want. A fixed long delay wastes time on failures that would clear in milliseconds. Neither adapts to the failure, and neither has a stopping condition, so a permanently broken endpoint becomes an infinite loop.

Exponential backoff fixes the timing: wait 1s, then 2s, then 4s, then 8s, growing the gap so a recovering server gets breathing room. But pure exponential backoff creates a second problem — the thundering herd. If a hundred workers all hit a 503 at the same instant, they all back off by the same amounts and retry in perfect synchronization, producing coordinated traffic spikes. Jitter — a random offset added to each wait — desynchronizes the workers so their retries spread out over time. Finally, a maximum wait cap stops the exponential curve from ballooning to minutes, and a stop condition (max attempts or max elapsed time) guarantees the operation eventually gives up. These same politeness concerns underpin How to Scrape a Static Website Without Getting Blocked.

Building a Retry Policy for HTTP Scraping

tenacity composes a policy from small, readable pieces: stop decides when to give up, wait decides how long to pause, and retry decides which outcomes are retryable. The example below retries on connection errors and on a set of retryable status codes, using exponential backoff bounded by a cap. It raises a custom exception on bad statuses so tenacity treats them as retryable failures rather than successful returns.

import httpx
from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential_jitter,
    retry_if_exception_type,
    before_sleep_log,
)
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("scraper")

HEADERS = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0 Safari/537.36"}
RETRYABLE_STATUS = {429, 500, 502, 503, 504}


class RetryableStatusError(Exception):
    """Raised for HTTP statuses that are worth retrying."""


@retry(
    stop=stop_after_attempt(5),
    wait=wait_exponential_jitter(initial=1, max=30),
    retry=retry_if_exception_type((httpx.TransportError, RetryableStatusError)),
    before_sleep=before_sleep_log(logger, logging.WARNING),
    reraise=True,
)
def fetch(client: httpx.Client, url: str) -> httpx.Response:
    resp = client.get(url, headers=HEADERS, timeout=15.0)
    if resp.status_code in RETRYABLE_STATUS:
        raise RetryableStatusError(f"{resp.status_code} for {url}")
    resp.raise_for_status()
    return resp


if __name__ == "__main__":
    with httpx.Client() as client:
        page = fetch(client, "https://httpbin.org/status/503")
        print(page.status_code, len(page.text))

wait_exponential_jitter bakes in both the exponential growth and the random jitter, with max acting as the cap. stop_after_attempt(5) guarantees the call gives up after five tries, and reraise=True surfaces the original exception instead of tenacity's wrapper so your calling code sees a real httpx error.

Retrying Async Requests and Honoring Retry-After

tenacity supports coroutines transparently: decorate an async def and it awaits the retries for you, which fits the async patterns from the complete guide to Python web scraping. A polite scraper should also respect a server's Retry-After header on 429 responses rather than guessing. You can read that header and feed it into a wait strategy, or handle it explicitly before deferring to backoff.

import asyncio
import httpx
from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential_jitter,
    retry_if_exception_type,
)

HEADERS = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15"}


class RateLimited(Exception):
    def __init__(self, retry_after: float) -> None:
        self.retry_after = retry_after
        super().__init__(f"rate limited, retry after {retry_after}s")


@retry(
    stop=stop_after_attempt(4),
    wait=wait_exponential_jitter(initial=2, max=60),
    retry=retry_if_exception_type((httpx.TransportError, RateLimited)),
    reraise=True,
)
async def fetch(client: httpx.AsyncClient, url: str) -> httpx.Response:
    resp = await client.get(url, headers=HEADERS, timeout=15.0)
    if resp.status_code == 429:
        wait_for = float(resp.headers.get("Retry-After", "5"))
        await asyncio.sleep(wait_for)
        raise RateLimited(wait_for)
    resp.raise_for_status()
    return resp


async def main() -> None:
    async with httpx.AsyncClient(http2=True) as client:
        resp = await fetch(client, "https://httpbin.org/get")
        print("ok", resp.status_code)


if __name__ == "__main__":
    asyncio.run(main())

Pairing tenacity with a rotating pool of exit IPs makes the whole system far more resilient; see Rotating Proxies and Managing IP Blocks for that layer, and lean on tenacity to smooth over the inevitable transient errors a distributed crawl produces.

Edge Cases and Caveats

  • Do not retry non-idempotent writes blindly. Retrying a POST that already succeeded but timed out on the response can create duplicates. Guard writes with idempotency keys before wrapping them in retries.
  • Never retry a 4xx you caused. A 400, 401, 403, or 404 will not fix itself. Only retry transient statuses (429, 5xx) and network errors; retrying a hard client error just wastes attempts.
  • Cap total time, not just attempts. Combine stop_after_attempt with stop_after_delay so a slow endpoint cannot make one request block for minutes even within the attempt budget.
  • Jitter is not optional at scale. Without jitter, many workers synchronize their retries and recreate the spike that caused the failure. Always use a jittered wait strategy in a concurrent crawl.
  • Log before sleeping. Use before_sleep_log so you can see how often retries fire; a sudden rise usually signals an IP block or rate limit, not random noise.
  • reraise=True matters for debugging. Without it, failures surface as RetryError, hiding the underlying httpx exception and making root-cause analysis harder.

Frequently Asked Questions

What is the difference between exponential backoff and jitter? Exponential backoff grows the wait between attempts (1s, 2s, 4s, 8s) so a struggling server gets more room each time. Jitter adds a random offset to each of those waits so many concurrent workers do not retry in lockstep. You want both: backoff for adaptivity, jitter to avoid synchronized traffic spikes.

Which HTTP status codes should I retry? Retry transient ones — 429 (rate limited) and 500, 502, 503, 504 (server-side failures) — plus network-level exceptions like timeouts and connection resets. Do not retry 400, 401, 403, or 404, because those reflect a problem with the request itself that a retry cannot fix.

Does tenacity work with async functions? Yes. Decorate an async def with @retry and tenacity awaits the retries for you, sleeping asynchronously between attempts so the event loop stays free. The policy syntax is identical to the synchronous case.

How do I stop retries from looping forever? Always supply a stop condition. Use stop_after_attempt(n) to cap the number of tries, or combine it with stop_after_delay(seconds) to also cap total elapsed time. Without a stop condition a permanently broken endpoint will retry indefinitely.