httpx vs aiohttp: Async Performance for Python Scrapers
Picking the right async HTTP client shapes the ceiling of your crawler's throughput, and this guide — part of Asynchronous Scraping with asyncio and httpx — pits httpx against aiohttp on the metrics that actually move the needle for scraping.
For pure request-per-second throughput on HTTP/1.1, aiohttp is usually a little faster because it is a single-purpose async client with a leaner code path. httpx trades a few percent of raw speed for a friendlier API, a shared sync/async surface, and native HTTP/2 support. For most scrapers the difference is dwarfed by network latency and politeness delays, so ergonomics and connection-pool tuning matter more than the microbenchmark winner.
Where the Performance Gap Actually Comes From
aiohttp was built from the start as an asyncio-only library, so every layer — from its ClientSession down to its connector — assumes a running event loop. That focus keeps per-request overhead low: fewer abstraction layers sit between your coroutine and the socket. In tight loops firing thousands of small GET requests against a fast local server, aiohttp typically edges out httpx by 10–25% on requests per second.
httpx is designed as a modern, batteries-included client that exposes the same API in both synchronous and asynchronous form. That symmetry is a real productivity win — you can prototype with httpx.Client, then switch to httpx.AsyncClient without relearning the interface — but the shared transport layer (built on httpcore) adds a thin overhead. httpx also parses and validates more of the response eagerly, which costs a little time but catches malformed responses earlier.
The practical takeaway: the gap is real but small, and it shrinks to noise the moment you add the per-domain delays that responsible scraping requires. If you are crawling remote sites over the public internet, round-trip latency of 50–300 ms per request swamps any client-side microsecond savings. Reserve the raw-throughput argument for cases where you hammer a single fast endpoint, such as a private API you have permission to query hard.
Connection Pooling and Concurrency Limits
Both clients reuse TCP connections through a keep-alive pool, and both let you cap concurrency — the single most important knob for not overwhelming a target or your own machine. In aiohttp you configure the pool through a TCPConnector; in httpx you pass httpx.Limits. Getting these limits right matters far more than which library you chose. An uncapped async scraper will happily open thousands of sockets and get you blocked, which is exactly the failure mode covered in Rotating Proxies and Managing IP Blocks.
The idiomatic pattern in both libraries is to combine a bounded connection pool with an asyncio.Semaphore that limits how many coroutines are in flight at once. The pool prevents socket exhaustion; the semaphore gives you a clean, per-run concurrency ceiling that you can tune to a polite value.
import asyncio
import aiohttp
HEADERS = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0 Safari/537.36"}
async def fetch(session: aiohttp.ClientSession, sem: asyncio.Semaphore, url: str) -> tuple[str, int]:
async with sem:
async with session.get(url, headers=HEADERS) as resp:
await resp.text()
return url, resp.status
async def crawl(urls: list[str], concurrency: int = 10) -> list[tuple[str, int]]:
sem = asyncio.Semaphore(concurrency)
connector = aiohttp.TCPConnector(limit=concurrency, ttl_dns_cache=300)
timeout = aiohttp.ClientTimeout(total=30)
async with aiohttp.ClientSession(connector=connector, timeout=timeout) as session:
tasks = [fetch(session, sem, u) for u in urls]
return await asyncio.gather(*tasks)
if __name__ == "__main__":
sample = ["https://httpbin.org/get"] * 20
results = asyncio.run(crawl(sample))
print(f"fetched {len(results)} pages; first status {results[0][1]}")
The httpx equivalent reads almost identically, which is the point — the mental model transfers cleanly:
import asyncio
import httpx
HEADERS = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15"}
async def fetch(client: httpx.AsyncClient, sem: asyncio.Semaphore, url: str) -> tuple[str, int]:
async with sem:
resp = await client.get(url, headers=HEADERS)
return url, resp.status_code
async def crawl(urls: list[str], concurrency: int = 10) -> list[tuple[str, int]]:
sem = asyncio.Semaphore(concurrency)
limits = httpx.Limits(max_connections=concurrency, max_keepalive_connections=concurrency)
timeout = httpx.Timeout(30.0)
async with httpx.AsyncClient(limits=limits, timeout=timeout, http2=True) as client:
tasks = [fetch(client, sem, u) for u in urls]
return await asyncio.gather(*tasks)
if __name__ == "__main__":
sample = ["https://httpbin.org/get"] * 20
results = asyncio.run(crawl(sample))
print(f"fetched {len(results)} pages; first status {results[0][1]}")
HTTP/2, Ergonomics, and the Deciding Factors
The clearest functional difference is HTTP/2. httpx supports it natively — install the h2 extra and pass http2=True, as shown above. Over HTTP/2, a single connection multiplexes many concurrent requests, which reduces connection churn and can improve throughput against modern servers and CDNs. aiohttp remains HTTP/1.1 only in its stable line, so if HTTP/2 multiplexing matters for your targets, httpx wins by default. HTTP/2 also changes your TLS and header fingerprint, which interacts with the techniques in TLS and JA3 Fingerprint Evasion.
On ergonomics, httpx offers the unified sync/async API, first-class Response.json(), straightforward timeout objects, and a requests-compatible feel that lowers the learning curve for anyone coming from the complete guide to Python web scraping. aiohttp bundles extras a scraper often wants anyway — a capable client, a full server framework, and WebSocket support — which is handy if you also need to build a control-plane service around your crawler. Whichever client you choose, pair it with a robust retry policy; the mechanics are covered in Retrying Failed Requests with Tenacity.
Edge Cases and Caveats
- Benchmarks lie without your workload. Microbenchmarks against
localhostexaggerate the client gap. Measure against representative targets with realistic latency before deciding. - HTTP/2 needs the extra.
httpx'shttp2=Truesilently falls back to HTTP/1.1 if you forget topip install "httpx[http2]". Verify withresponse.http_version. aiohttpdecodes on your terms. Callingresp.text()triggers charset detection; on mislabeled pages you may need to pass an explicit encoding to avoid the pitfalls described in Fixing Common Unicode Errors in Python Scraping.- Reuse one client per run. Creating a fresh
AsyncClientorClientSessionper request destroys the connection pool and tanks throughput. Instantiate once and share it. - Semaphore vs pool are not the same limit. A pool caps open sockets; a semaphore caps in-flight coroutines. Set both, and keep them polite rather than maximal.
- DNS caching differs.
aiohttpcaches DNS viattl_dns_cache;httpxrelies on the OS resolver. Under heavy fan-out, DNS can become the bottleneck before the client does.
Frequently Asked Questions
Is aiohttp always faster than httpx?
Not in a way that matters for most scraping. aiohttp tends to win raw requests-per-second benchmarks against fast endpoints by roughly 10–25%, but once real network latency and polite per-domain delays enter the picture, the difference usually disappears into noise.
Should I pick httpx just for HTTP/2?
If your targets serve HTTP/2 and you want connection multiplexing or a more browser-like network fingerprint, yes — httpx is the straightforward choice because aiohttp's stable releases are HTTP/1.1 only. Remember to install the h2 extra and confirm the negotiated protocol with response.http_version.
Can I share one client across many coroutines?
Yes, and you should. Both httpx.AsyncClient and aiohttp.ClientSession are designed to be created once and reused across all requests in a run so the connection pool and keep-alive can do their job. Creating a new client per request is the most common performance mistake.
Which is easier to migrate to from requests?httpx is the gentler migration because its API mirrors requests closely and offers both sync and async clients with the same surface. You can port a synchronous scraper first, then swap Client for AsyncClient once the logic is proven.