Based on PPTAgent (EMNLP 2025) and DocPres research findings:
1. Brief summarization (summarize_brief.py)
- For content >800 chars: single LLM call extracts {overview, sections[{title,
key_points, data_points}]} before outline generation
- Prevents "lost middle" context loss in long documents
- BriefStructure.to_outline_context() formats sections for outline prompt
- BriefStructure.get_section_text(idx) returns targeted excerpt per slide
2. Section attribution in SlideOutlineModel
- Added source_section_idx: Optional[int] field
- LLM sets this during outline generation to map each slide → brief section
- Used to pass targeted section text to per-slide content generation
instead of full brief (reduces hallucination, improves accuracy)
3. Narrative continuity in slide content generation
- prev_slide_title passed to each content generation call
- Injected in user prompt: "ensure this slide continues naturally from..."
- Batch-safe: titles collected from completed batch before next starts
4. Source section text in content generation
- source_section_text parameter added to get_slide_content_from_type_and_outline
- Injected as "Source Material for This Slide" in user prompt
- Only data points present in the excerpt should be used
5. Richer layout catalog
- PresentationLayoutModel.to_catalog_string() added
- Includes field names + maxLength constraints alongside layout descriptions
- Helps LLM make informed layout choices based on content type
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>