Flattening Nested JSON with pandas json_normalize

This walkthrough tackles one recurring problem from Parsing JSON and XML API Responses: collapsing a nested API payload into flat rows you can write to a CSV or database.

json_normalize walks the nested tree and dots the path into flat column names.

pandas.json_normalize is the fastest way to turn nested JSON into a table. Point it at a list of records and it walks each object, dotting nested keys into column names like user.address.city. Use record_path to explode an inner list into one row per element, meta to carry parent fields down onto those rows, and sep to choose the delimiter. Get those three arguments right and almost any API response becomes a clean DataFrame in one call.

The Simple Case: Flat Dotted Columns

For a list of dicts with nested objects, json_normalize flattens every level by joining keys with a dot.

import pandas as pd

records = [
    {"id": 7, "user": {"name": "Ada", "address": {"city": "Bath", "zip": "BA1"}}},
    {"id": 8, "user": {"name": "Bo", "address": {"city": "York", "zip": "YO1"}}},
]

df = pd.json_normalize(records)
print(df.columns.tolist())
# ['id', 'user.name', 'user.address.city', 'user.address.zip']

Each nested object becomes dotted columns; there is no manual key-chasing. If you prefer a different delimiter — say, to keep dotted keys from clashing with pandas accessor syntax — pass sep="_".

Exploding Nested Lists with record_path

The harder case is a record that contains a list of sub-items — orders with line items, a product with variants. You usually want one row per sub-item, with the parent fields repeated. That is what record_path (the list to explode) and meta (parent fields to carry down) are for.

import pandas as pd

payload = {
    "order_id": "A-100",
    "customer": "Ada",
    "items": [
        {"sku": "AER-001", "qty": 1, "price": 1395.0},
        {"sku": "MAT-009", "qty": 2, "price": 49.5},
    ],
}

df = pd.json_normalize(
    payload,
    record_path="items",              # explode this list -> one row each
    meta=["order_id", "customer"],    # repeat these parent fields per row
)
print(df)
#       sku  qty   price order_id customer
# 0  AER-001    1  1395.0    A-100      Ada
# 1  MAT-009    2    49.5    A-100      Ada

For deeply nested parent fields, meta accepts lists of keys, e.g. meta=[["shipping", "country"]] reaches shipping.country.

End-to-End: API Response to CSV

Putting it together with a real request — fetch JSON with a proper User-Agent, normalize the nested results, and write a flat CSV ready for a spreadsheet or database load.

import pandas as pd
import requests

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Accept": "application/json",
}

def orders_to_csv(url: str, path: str) -> pd.DataFrame:
    resp = requests.get(url, headers=HEADERS, timeout=15)
    resp.raise_for_status()
    payload = resp.json()

    df = pd.json_normalize(
        payload["orders"],
        record_path="items",
        meta=["order_id", "customer", ["shipping", "country"]],
        sep=".",
        errors="ignore",            # tolerate records missing a meta key
    )
    df.to_csv(path, index=False, encoding="utf-8")
    return df

if __name__ == "__main__":
    frame = orders_to_csv("https://api.example.com/v1/orders", "orders.csv")
    print(frame.head())

From here the DataFrame drops straight into any sink covered in Storing and Exporting Scraped Data — to_parquet, to_sql, or a deduplicated CSV.

Edge Cases and Caveats

Missing meta keys raise KeyError. If some records lack a field named in meta, pass errors="ignore" so pandas fills NaN instead of crashing the whole batch.
Ragged records. When objects have different key sets, json_normalize unions all columns and fills gaps with NaN — convenient, but verify no column is silently all-null from a typo in a key name.
Lists of scalars don't explode cleanly. record_path expects a list of dicts. A list of plain strings or numbers lands as a single object column; explode it afterwards with df.explode("column").
Very deep nesting is slow and wide. Normalizing everything can produce hundreds of columns. Select the record_path and meta you actually need rather than flattening the entire tree.
Dotted column names need bracket access. After flattening, use df["user.address.city"], not df.user.address.city; the dotted string is a single column label, not an attribute chain.
max_level caps depth. Pass max_level=1 to stop flattening below the first nested level and keep deeper objects intact as dict-valued cells.

Frequently Asked Questions

What does json_normalize do that json.loads does not?json.loads parses text into nested Python dicts and lists; json_normalize takes those already-parsed objects and flattens them into a tabular pandas DataFrame with dotted column names. They are sequential steps: parse first, then normalize.

How do I get one row per item in a nested list? Set record_path to the name of the list to explode and list the parent fields you want repeated in meta. Each element of the list becomes its own row, with the meta values copied down onto every row.

How do I avoid a KeyError when some records are missing a field? Pass errors="ignore" to json_normalize. Missing meta keys are then filled with NaN instead of raising, so one incomplete record does not abort the entire batch.

Can I change the dot separator in the flattened column names? Yes — pass sep with your preferred delimiter, for example sep="_" to produce user_address_city instead of user.address.city, which avoids clashes with pandas attribute-style access.