Flattening Nested JSON with pandas json_normalize
This walkthrough tackles one recurring problem from Parsing JSON and XML API Responses: collapsing a nested API payload into flat rows you can write to a CSV or database.
pandas.json_normalize is the fastest way to turn nested JSON into a table. Point it at a list of records and it walks each object, dotting nested keys into column names like user.address.city. Use record_path to explode an inner list into one row per element, meta to carry parent fields down onto those rows, and sep to choose the delimiter. Get those three arguments right and almost any API response becomes a clean DataFrame in one call.
The Simple Case: Flat Dotted Columns
For a list of dicts with nested objects, json_normalize flattens every level by joining keys with a dot.
import pandas as pd
records = [
{"id": 7, "user": {"name": "Ada", "address": {"city": "Bath", "zip": "BA1"}}},
{"id": 8, "user": {"name": "Bo", "address": {"city": "York", "zip": "YO1"}}},
]
df = pd.json_normalize(records)
print(df.columns.tolist())
# ['id', 'user.name', 'user.address.city', 'user.address.zip']
Each nested object becomes dotted columns; there is no manual key-chasing. If you prefer a different delimiter — say, to keep dotted keys from clashing with pandas accessor syntax — pass sep="_".
Exploding Nested Lists with record_path
The harder case is a record that contains a list of sub-items — orders with line items, a product with variants. You usually want one row per sub-item, with the parent fields repeated. That is what record_path (the list to explode) and meta (parent fields to carry down) are for.
import pandas as pd
payload = {
"order_id": "A-100",
"customer": "Ada",
"items": [
{"sku": "AER-001", "qty": 1, "price": 1395.0},
{"sku": "MAT-009", "qty": 2, "price": 49.5},
],
}
df = pd.json_normalize(
payload,
record_path="items", # explode this list -> one row each
meta=["order_id", "customer"], # repeat these parent fields per row
)
print(df)
# sku qty price order_id customer
# 0 AER-001 1 1395.0 A-100 Ada
# 1 MAT-009 2 49.5 A-100 Ada
For deeply nested parent fields, meta accepts lists of keys, e.g. meta=[["shipping", "country"]] reaches shipping.country.
End-to-End: API Response to CSV
Putting it together with a real request — fetch JSON with a proper User-Agent, normalize the nested results, and write a flat CSV ready for a spreadsheet or database load.
import pandas as pd
import requests
HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept": "application/json",
}
def orders_to_csv(url: str, path: str) -> pd.DataFrame:
resp = requests.get(url, headers=HEADERS, timeout=15)
resp.raise_for_status()
payload = resp.json()
df = pd.json_normalize(
payload["orders"],
record_path="items",
meta=["order_id", "customer", ["shipping", "country"]],
sep=".",
errors="ignore", # tolerate records missing a meta key
)
df.to_csv(path, index=False, encoding="utf-8")
return df
if __name__ == "__main__":
frame = orders_to_csv("https://api.example.com/v1/orders", "orders.csv")
print(frame.head())
From here the DataFrame drops straight into any sink covered in Storing and Exporting Scraped Data — to_parquet, to_sql, or a deduplicated CSV.
Edge Cases and Caveats
- Missing meta keys raise
KeyError. If some records lack a field named inmeta, passerrors="ignore"so pandas fillsNaNinstead of crashing the whole batch. - Ragged records. When objects have different key sets,
json_normalizeunions all columns and fills gaps withNaN— convenient, but verify no column is silently all-null from a typo in a key name. - Lists of scalars don't explode cleanly.
record_pathexpects a list of dicts. A list of plain strings or numbers lands as a single object column; explode it afterwards withdf.explode("column"). - Very deep nesting is slow and wide. Normalizing everything can produce hundreds of columns. Select the
record_pathandmetayou actually need rather than flattening the entire tree. - Dotted column names need bracket access. After flattening, use
df["user.address.city"], notdf.user.address.city; the dotted string is a single column label, not an attribute chain. max_levelcaps depth. Passmax_level=1to stop flattening below the first nested level and keep deeper objects intact as dict-valued cells.
Frequently Asked Questions
What does json_normalize do that json.loads does not?json.loads parses text into nested Python dicts and lists; json_normalize takes those already-parsed objects and flattens them into a tabular pandas DataFrame with dotted column names. They are sequential steps: parse first, then normalize.
How do I get one row per item in a nested list?
Set record_path to the name of the list to explode and list the parent fields you want repeated in meta. Each element of the list becomes its own row, with the meta values copied down onto every row.
How do I avoid a KeyError when some records are missing a field?
Pass errors="ignore" to json_normalize. Missing meta keys are then filled with NaN instead of raising, so one incomplete record does not abort the entire batch.
Can I change the dot separator in the flattened column names?
Yes — pass sep with your preferred delimiter, for example sep="_" to produce user_address_city instead of user.address.city, which avoids clashes with pandas attribute-style access.