LlamaExtract — Truthy Object with `data=None`

The Gotcha

LlamaExtract.aextract() always returns an object — even when the PDF cannot be parsed (scanned image, password-protected, or structured in a way the model cannot interpret). The returned object evaluates as truthy in Python, but its .data attribute is None.

result = await extractor.aextract(file_path)

# WRONG — passes the truthiness check, crashes on result.data usage
if result:
    process(result.data)  # AttributeError or NoneType iteration

# CORRECT — guard both object existence and data presence
if result and result.data:
    process(result.data)

Why It Happens

LlamaExtract returns a response envelope object regardless of extraction success. The envelope tracks metadata (request ID, status, errors) even when no structured data was produced. Python's default __bool__ for custom objects is True unless explicitly overridden — and LlamaExtract does not override it based on data content.

Failure Modes for PDFs

LlamaExtract silently returns data=None (rather than raising) for:

PDF Type	Reason
Scanned image PDF	No text layer for LLM to read
Password-protected	Cannot decrypt
Heavily structured tables	LLM fails to map to schema
Corrupt / truncated	Parse error caught internally

Resilient Pipeline Pattern

For document processing pipelines, prefer an LLM fallback over hard failure:

async def extract_with_fallback(file_path: str) -> dict:
    result = await extractor.aextract(file_path)
    
    if result and result.data:
        return result.data[0]  # structured extraction succeeded
    
    # Fallback: send raw text to a general LLM
    raw_text = extract_text_with_pdfplumber(file_path)
    return await llm_parse_freeform(raw_text)

This keeps the pipeline running for all document types — structured extraction for machine-readable PDFs, LLM freeform parsing for the rest.

[!tip] Always log extraction failures Log when result.data is None with the file name so you can audit which PDFs are falling back. Silent fallbacks without logging make debugging impossible later.

Python truthiness — if obj: tests bool(obj), not obj is not None. Use if obj is not None or check specific attributes.

2.5 KiB Raw Blame History

LlamaExtract — Truthy Object with data=None

The Gotcha

Why It Happens

Failure Modes for PDFs

Resilient Pipeline Pattern

Related

2.5 KiB

Raw Blame History

LlamaExtract — Truthy Object with `data=None`