No description
Find a file
michael 29134011be Add docker-compose for Neo4j with APOC plugin
Pinned to neo4j:5 to match the Python driver. Uses named volumes
and restart policy for production use.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 16:22:13 -06:00
chat-interface Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
db Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
documentation Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
neo4j Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
supporting_files Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
.gitignore Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
ai_core.py Update OpenAI models to gpt-4.1 and gpt-4.1-mini 2026-02-23 15:13:41 -06:00
CLAUDE.md Update OpenAI models to gpt-4.1 and gpt-4.1-mini 2026-02-23 15:13:41 -06:00
config.py Update OpenAI models to gpt-4.1 and gpt-4.1-mini 2026-02-23 15:13:41 -06:00
DEVELOPMENT.md Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
docker-compose.yml Add docker-compose for Neo4j with APOC plugin 2026-02-23 16:22:13 -06:00
document_generator.py Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
flowchart.md Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
graph_rag_integration.py Update OpenAI models to gpt-4.1 and gpt-4.1-mini 2026-02-23 15:13:41 -06:00
graphRAG.py Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
init_mongodb.py Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
json_utils.py Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
LOCAL_DEV.md Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
main.py Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
mongodb_utils.py Update OpenAI models to gpt-4.1 and gpt-4.1-mini 2026-02-23 15:13:41 -06:00
NETFLIX_GRAPHRAG_DESCRIPTION.md Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
netflix_technical_diagram.md Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
README.md Update OpenAI models to gpt-4.1 and gpt-4.1-mini 2026-02-23 15:13:41 -06:00
requirements.txt Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
routes.py Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
session_manager.py Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
shared_state.py Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00
utils.py Initial commit: Netflix GraphRAG marketing chatbot 2026-02-23 10:28:33 -06:00

Netflix GraphRAG Marketing Chatbot

An AI-powered knowledge assistant that answers questions about Netflix marketing materials — specifically the GPD Key Art Playbook and related design guidelines. The system combines traditional vector search (RAG) with a Neo4j knowledge graph (GraphRAG) to deliver contextual, cross-document answers with source citations and relevant document images.

How It Works

The chatbot uses a dual retrieval approach:

  1. Vector Search — Documents are parsed, semantically chunked, and embedded with OpenAI embeddings. User queries are matched against these chunks via similarity search.
  2. GraphRAG — Entities and relationships are extracted from document chunks into a Neo4j knowledge graph. Community detection (Louvain clustering) groups related entities, and each community receives an AI-generated summary. At query time, relevant communities are retrieved alongside vector results.

A custom ReAct agent (built on LlamaIndex Workflows) orchestrates both retrieval tools, deciding which to call based on the query, then synthesizes a unified response.

User Query → ReAct Agent → [Vector Tool | GraphRAG Tool] → Response Synthesis → Answer + Sources + Images

Why GraphRAG?

Standard RAG retrieves isolated text chunks. GraphRAG adds:

  • Cross-document connections — Links entities that appear across different documents
  • Community context — Provides broader topical summaries, not just individual chunks
  • Semantic relationships — Understands how concepts relate beyond keyword overlap

Architecture

┌─────────────────┐     HTTP/JSON     ┌──────────────────────────────────┐
│  React Frontend │ ◄──────────────► │  Flask + Hypercorn (async ASGI)  │
│  (Vite, TailwindCSS)│                │                                  │
└─────────────────┘                  │  ┌────────────────────────────┐  │
                                     │  │  ReAct Agent (Workflow)     │  │
                                     │  │  ├─ Vector Query Tool       │  │
                                     │  │  └─ GraphRAG Query Tool     │  │
                                     │  └────────────────────────────┘  │
                                     └──────┬──────────┬───────────────┘
                                            │          │
                              ┌─────────────┤          ├─────────────┐
                              ▼             ▼          ▼             ▼
                        ┌──────────┐  ┌──────────┐  ┌─────┐  ┌───────────┐
                        │ MongoDB  │  │  Neo4j   │  │OpenAI│  │LlamaCloud │
                        │(sessions,│  │(knowledge│  │(LLM, │  │(LlamaParse│
                        │  convos) │  │  graph)  │  │embed)│  │ doc parse)│
                        └──────────┘  └──────────┘  └─────┘  └───────────┘

Prerequisites

  • Python 3.10+
  • Node.js 18+
  • Docker (for MongoDB and Neo4j)
  • API Keys: OpenAI, Anthropic (optional), LlamaCloud (for document parsing)

Quick Start

1. Environment Configuration

Create a .env file in the project root:

PRODUCTION=false

OPENAI_API_KEY=your-openai-key
ANTHROPIC_API_KEY=your-anthropic-key
LLAMA_CLOUD_API_KEY=your-llamacloud-key

2. Start Databases

# MongoDB (conversation storage)
cd db && docker-compose up -d && cd ..

# Neo4j (knowledge graph)
cd neo4j && docker-compose -f docker-compose-neo4j.yml up -d && cd ..

Neo4j Browser is available at http://localhost:7474.

3. Backend

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

python main.py

The backend starts on http://localhost:6175. On first run, it will:

  1. Connect to MongoDB and initialize the schema
  2. Parse documents from supporting_files/files_for_rag_store/ via LlamaParse
  3. Build the vector index and persist it to index_storage/
  4. Extract entities/relationships and populate the Neo4j knowledge graph
  5. Run community detection and generate community summaries

Subsequent starts load the persisted index from index_storage/ (much faster).

4. Frontend

cd chat-interface
npm install
npm run dev

The frontend starts on http://localhost:5173 and connects to the backend at the URL specified in chat-interface/.env.development.

Development Mode

With PRODUCTION=false in .env:

  • Authentication is bypassed — no Microsoft login required; the system uses dev_user@local automatically
  • Hot reload is enabled on the backend (Hypercorn reloader)
  • The frontend works even without a running backend (mock responses for UI development)

The frontend has its own env files:

  • chat-interface/.env.development — sets VITE_BACKEND_URL for local dev (default: http://localhost:6175)
  • chat-interface/.env.production — points to the production backend

API Endpoints

Method Endpoint Description
POST /chat Send a chat message. Body: { message, sessionId }. Header: X-MS-USERNAME
GET /conversations List conversations for the authenticated user
GET /conversations/<id>/messages Retrieve messages for a conversation
DELETE /conversations/<id> Delete a conversation
GET /images/<filename> Serve an extracted document image
GET /status System initialization status

Example Request

curl -X POST http://localhost:6175/chat \
  -H "Content-Type: application/json" \
  -H "X-MS-USERNAME: dev_user@local" \
  -d '{"message": "What are the key art guidelines?", "sessionId": "test-session-1"}'

Project Structure

├── main.py                    # Flask app init, Hypercorn config, startup sequence
├── config.py                  # All configuration (API keys, models, paths, timeouts)
├── routes.py                  # Flask route handlers (chat, conversations, images)
├── ai_core.py                 # ReAct agent workflow, vector index, document processing
├── graph_rag_integration.py   # GraphRAG classes (extractor, store, query engine)
├── graphRAG.py                # Standalone GraphRAG (for testing)
├── shared_state.py            # Global state management for agent/index/graph components
├── session_manager.py         # Session → user/conversation mapping
├── mongodb_utils.py           # MongoDB CRUD operations
├── document_generator.py      # DOCX brief export from conversations
├── json_utils.py              # Custom JSON serializer for LlamaIndex objects
├── utils.py                   # Logging utilities
├── .env                       # Environment variables (API keys, mode)
├── requirements.txt           # Python dependencies
├── index_storage/             # Persisted vector index (auto-generated)
├── supporting_files/          # Source documents for the knowledge base
│   └── files_for_rag_store/   # Documents ingested by the RAG pipeline
├── uploads/images/            # Extracted document page images
├── db/                        # MongoDB docker-compose
├── neo4j/                     # Neo4j docker-compose and data volumes
└── chat-interface/            # React frontend
    ├── src/App.jsx            # Main chat UI component
    ├── src/auth.js            # MSAL authentication with dev bypass
    └── src/components/        # UI components (ConversationManager, ThemeToggle)

Reindexing Documents

To rebuild the index after changing source documents:

rm -rf index_storage/
python main.py

This re-parses all documents, rebuilds the vector index, and regenerates the knowledge graph.

Tech Stack

Layer Technology
LLM OpenAI GPT-4.1 (gpt-4.1)
Embeddings OpenAI text-embedding-3-small
RAG Framework LlamaIndex
Document Parsing LlamaParse (LlamaCloud)
Knowledge Graph Neo4j + NetworkX (community detection)
Backend Python, Flask, Hypercorn
Conversation DB MongoDB (via PyMongo)
Frontend React 18, Vite, Tailwind CSS
Auth Microsoft MSAL (Azure AD)

Troubleshooting

Issue Fix
Login screen appears in dev mode Verify PRODUCTION=false in .env
MongoDB connection error Ensure MongoDB container is running: docker ps
Neo4j connection error Check Neo4j container and verify credentials in config.py match docker-compose-neo4j.yml
Frontend can't reach backend Check VITE_BACKEND_URL in chat-interface/.env.development matches the backend port
CORS errors Verify the frontend origin is listed in CORS_ALLOWED_ORIGINS in config.py
Slow first startup Expected — LlamaParse document processing and graph construction take time. Subsequent starts use the cached index