ppt-tool/backend
Vadym Samoilenko f73291285d Improve presentation pipeline: brief summarization + section attribution + narrative continuity
Based on PPTAgent (EMNLP 2025) and DocPres research findings:

1. Brief summarization (summarize_brief.py)
   - For content >800 chars: single LLM call extracts {overview, sections[{title,
     key_points, data_points}]} before outline generation
   - Prevents "lost middle" context loss in long documents
   - BriefStructure.to_outline_context() formats sections for outline prompt
   - BriefStructure.get_section_text(idx) returns targeted excerpt per slide

2. Section attribution in SlideOutlineModel
   - Added source_section_idx: Optional[int] field
   - LLM sets this during outline generation to map each slide → brief section
   - Used to pass targeted section text to per-slide content generation
     instead of full brief (reduces hallucination, improves accuracy)

3. Narrative continuity in slide content generation
   - prev_slide_title passed to each content generation call
   - Injected in user prompt: "ensure this slide continues naturally from..."
   - Batch-safe: titles collected from completed batch before next starts

4. Source section text in content generation
   - source_section_text parameter added to get_slide_content_from_type_and_outline
   - Injected as "Source Material for This Slide" in user prompt
   - Only data points present in the excerpt should be used

5. Richer layout catalog
   - PresentationLayoutModel.to_catalog_string() added
   - Includes field names + maxLength constraints alongside layout descriptions
   - Helps LLM make informed layout choices based on content type

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 20:22:22 +00:00
..
alembic/versions Fix migration: move to correct path, update down_revision to c7a3f8e21d4b 2026-03-01 20:10:36 +00:00
api Improve presentation pipeline: brief summarization + section attribution + narrative continuity 2026-03-19 20:22:22 +00:00
assets Phase 1-2: Foundation + Admin Panel & Client Management 2026-02-26 15:37:17 +00:00
constants Phase 2: Admin panel, analytics, storage, template pipeline, multi-provider LLM 2026-02-26 23:39:34 +00:00
enums Phase 1-2: Foundation + Admin Panel & Client Management 2026-02-26 15:37:17 +00:00
migrations Fix migration: move to correct path, update down_revision to c7a3f8e21d4b 2026-03-01 20:10:36 +00:00
models Improve presentation pipeline: brief summarization + section attribution + narrative continuity 2026-03-19 20:22:22 +00:00
scripts Phase 2: Admin panel, analytics, storage, template pipeline, multi-provider LLM 2026-02-26 23:39:34 +00:00
services Replace docling+layoutparser+torch with PyMuPDF (~3.5GB → ~80MB) 2026-03-19 20:06:46 +00:00
static Phase 1-2: Foundation + Admin Panel & Client Management 2026-02-26 15:37:17 +00:00
tests Phase 7: Apply design system to all admin pages + fix test stubs 2026-03-01 19:01:52 +00:00
utils Improve presentation pipeline: brief summarization + section attribution + narrative continuity 2026-03-19 20:22:22 +00:00
workers Increase ARQ job timeout to 90 minutes 2026-02-27 21:48:51 +00:00
.python-version Phase 1-2: Foundation + Admin Panel & Client Management 2026-02-26 15:37:17 +00:00
alembic.ini Phase 1-2: Foundation + Admin Panel & Client Management 2026-02-26 15:37:17 +00:00
Dockerfile Phase 4: Fix critical bugs, improve document parsing, add vision OCR 2026-02-27 14:07:00 +00:00
mcp_server.py Rebrand Presenton to Oliver DeckForge, pre-configure models, use NanoBanana Pro 2026-02-26 18:17:11 +00:00
openai_spec.json Phase 1-2: Foundation + Admin Panel & Client Management 2026-02-26 15:37:17 +00:00
pyproject.toml Replace docling+layoutparser+torch with PyMuPDF (~3.5GB → ~80MB) 2026-03-19 20:06:46 +00:00
server.py Phase 1-2: Foundation + Admin Panel & Client Management 2026-02-26 15:37:17 +00:00
uv.lock Phase 1-2: Foundation + Admin Panel & Client Management 2026-02-26 15:37:17 +00:00