No description

Find a file

michael 29134011be Add docker-compose for Neo4j with APOC plugin Pinned to neo4j:5 to match the Python driver. Uses named volumes and restart policy for production use. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>		2026-02-23 16:22:13 -06:00
chat-interface	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
db	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
documentation	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
neo4j	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
supporting_files	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
.gitignore	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
ai_core.py	Update OpenAI models to gpt-4.1 and gpt-4.1-mini	2026-02-23 15:13:41 -06:00
CLAUDE.md	Update OpenAI models to gpt-4.1 and gpt-4.1-mini	2026-02-23 15:13:41 -06:00
config.py	Update OpenAI models to gpt-4.1 and gpt-4.1-mini	2026-02-23 15:13:41 -06:00
DEVELOPMENT.md	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
docker-compose.yml	Add docker-compose for Neo4j with APOC plugin	2026-02-23 16:22:13 -06:00
document_generator.py	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
flowchart.md	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
graph_rag_integration.py	Update OpenAI models to gpt-4.1 and gpt-4.1-mini	2026-02-23 15:13:41 -06:00
graphRAG.py	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
init_mongodb.py	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
json_utils.py	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
LOCAL_DEV.md	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
main.py	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
mongodb_utils.py	Update OpenAI models to gpt-4.1 and gpt-4.1-mini	2026-02-23 15:13:41 -06:00
NETFLIX_GRAPHRAG_DESCRIPTION.md	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
netflix_technical_diagram.md	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
README.md	Update OpenAI models to gpt-4.1 and gpt-4.1-mini	2026-02-23 15:13:41 -06:00
requirements.txt	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
routes.py	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
session_manager.py	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
shared_state.py	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00
utils.py	Initial commit: Netflix GraphRAG marketing chatbot	2026-02-23 10:28:33 -06:00

README.md

Netflix GraphRAG Marketing Chatbot

An AI-powered knowledge assistant that answers questions about Netflix marketing materials — specifically the GPD Key Art Playbook and related design guidelines. The system combines traditional vector search (RAG) with a Neo4j knowledge graph (GraphRAG) to deliver contextual, cross-document answers with source citations and relevant document images.

How It Works

The chatbot uses a dual retrieval approach:

Vector Search — Documents are parsed, semantically chunked, and embedded with OpenAI embeddings. User queries are matched against these chunks via similarity search.
GraphRAG — Entities and relationships are extracted from document chunks into a Neo4j knowledge graph. Community detection (Louvain clustering) groups related entities, and each community receives an AI-generated summary. At query time, relevant communities are retrieved alongside vector results.

A custom ReAct agent (built on LlamaIndex Workflows) orchestrates both retrieval tools, deciding which to call based on the query, then synthesizes a unified response.

User Query → ReAct Agent → [Vector Tool | GraphRAG Tool] → Response Synthesis → Answer + Sources + Images

Why GraphRAG?

Standard RAG retrieves isolated text chunks. GraphRAG adds:

Cross-document connections — Links entities that appear across different documents
Community context — Provides broader topical summaries, not just individual chunks
Semantic relationships — Understands how concepts relate beyond keyword overlap

Architecture

┌─────────────────┐     HTTP/JSON     ┌──────────────────────────────────┐
│  React Frontend │ ◄──────────────► │  Flask + Hypercorn (async ASGI)  │
│  (Vite, TailwindCSS)│                │                                  │
└─────────────────┘                  │  ┌────────────────────────────┐  │
                                     │  │  ReAct Agent (Workflow)     │  │
                                     │  │  ├─ Vector Query Tool       │  │
                                     │  │  └─ GraphRAG Query Tool     │  │
                                     │  └────────────────────────────┘  │
                                     └──────┬──────────┬───────────────┘
                                            │          │
                              ┌─────────────┤          ├─────────────┐
                              ▼             ▼          ▼             ▼
                        ┌──────────┐  ┌──────────┐  ┌─────┐  ┌───────────┐
                        │ MongoDB  │  │  Neo4j   │  │OpenAI│  │LlamaCloud │
                        │(sessions,│  │(knowledge│  │(LLM, │  │(LlamaParse│
                        │  convos) │  │  graph)  │  │embed)│  │ doc parse)│
                        └──────────┘  └──────────┘  └─────┘  └───────────┘

Prerequisites

Python 3.10+
Node.js 18+
Docker (for MongoDB and Neo4j)
API Keys: OpenAI, Anthropic (optional), LlamaCloud (for document parsing)

Quick Start

1. Environment Configuration

Create a .env file in the project root:

PRODUCTION=false

OPENAI_API_KEY=your-openai-key
ANTHROPIC_API_KEY=your-anthropic-key
LLAMA_CLOUD_API_KEY=your-llamacloud-key

2. Start Databases

# MongoDB (conversation storage)
cd db && docker-compose up -d && cd ..

# Neo4j (knowledge graph)
cd neo4j && docker-compose -f docker-compose-neo4j.yml up -d && cd ..

Neo4j Browser is available at http://localhost:7474.

3. Backend

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

python main.py

The backend starts on http://localhost:6175. On first run, it will:

Connect to MongoDB and initialize the schema
Parse documents from supporting_files/files_for_rag_store/ via LlamaParse
Build the vector index and persist it to index_storage/
Extract entities/relationships and populate the Neo4j knowledge graph
Run community detection and generate community summaries

Subsequent starts load the persisted index from index_storage/ (much faster).

4. Frontend

cd chat-interface
npm install
npm run dev

The frontend starts on http://localhost:5173 and connects to the backend at the URL specified in chat-interface/.env.development.

Development Mode

With PRODUCTION=false in .env:

Authentication is bypassed — no Microsoft login required; the system uses dev_user@local automatically
Hot reload is enabled on the backend (Hypercorn reloader)
The frontend works even without a running backend (mock responses for UI development)

The frontend has its own env files:

chat-interface/.env.development — sets VITE_BACKEND_URL for local dev (default: http://localhost:6175)
chat-interface/.env.production — points to the production backend

API Endpoints

Method	Endpoint	Description
`POST`	`/chat`	Send a chat message. Body: `{ message, sessionId }`. Header: `X-MS-USERNAME`
`GET`	`/conversations`	List conversations for the authenticated user
`GET`	`/conversations/<id>/messages`	Retrieve messages for a conversation
`DELETE`	`/conversations/<id>`	Delete a conversation
`GET`	`/images/<filename>`	Serve an extracted document image
`GET`	`/status`	System initialization status

Example Request

curl -X POST http://localhost:6175/chat \
  -H "Content-Type: application/json" \
  -H "X-MS-USERNAME: dev_user@local" \
  -d '{"message": "What are the key art guidelines?", "sessionId": "test-session-1"}'

Project Structure

├── main.py                    # Flask app init, Hypercorn config, startup sequence
├── config.py                  # All configuration (API keys, models, paths, timeouts)
├── routes.py                  # Flask route handlers (chat, conversations, images)
├── ai_core.py                 # ReAct agent workflow, vector index, document processing
├── graph_rag_integration.py   # GraphRAG classes (extractor, store, query engine)
├── graphRAG.py                # Standalone GraphRAG (for testing)
├── shared_state.py            # Global state management for agent/index/graph components
├── session_manager.py         # Session → user/conversation mapping
├── mongodb_utils.py           # MongoDB CRUD operations
├── document_generator.py      # DOCX brief export from conversations
├── json_utils.py              # Custom JSON serializer for LlamaIndex objects
├── utils.py                   # Logging utilities
├── .env                       # Environment variables (API keys, mode)
├── requirements.txt           # Python dependencies
├── index_storage/             # Persisted vector index (auto-generated)
├── supporting_files/          # Source documents for the knowledge base
│   └── files_for_rag_store/   # Documents ingested by the RAG pipeline
├── uploads/images/            # Extracted document page images
├── db/                        # MongoDB docker-compose
├── neo4j/                     # Neo4j docker-compose and data volumes
└── chat-interface/            # React frontend
    ├── src/App.jsx            # Main chat UI component
    ├── src/auth.js            # MSAL authentication with dev bypass
    └── src/components/        # UI components (ConversationManager, ThemeToggle)

Reindexing Documents

To rebuild the index after changing source documents:

rm -rf index_storage/
python main.py

This re-parses all documents, rebuilds the vector index, and regenerates the knowledge graph.

Tech Stack

Layer	Technology
LLM	OpenAI GPT-4.1 (`gpt-4.1`)
Embeddings	OpenAI `text-embedding-3-small`
RAG Framework	LlamaIndex
Document Parsing	LlamaParse (LlamaCloud)
Knowledge Graph	Neo4j + NetworkX (community detection)
Backend	Python, Flask, Hypercorn
Conversation DB	MongoDB (via PyMongo)
Frontend	React 18, Vite, Tailwind CSS
Auth	Microsoft MSAL (Azure AD)

Troubleshooting

Issue	Fix
Login screen appears in dev mode	Verify `PRODUCTION=false` in `.env`
MongoDB connection error	Ensure MongoDB container is running: `docker ps`
Neo4j connection error	Check Neo4j container and verify credentials in `config.py` match `docker-compose-neo4j.yml`
Frontend can't reach backend	Check `VITE_BACKEND_URL` in `chat-interface/.env.development` matches the backend port
CORS errors	Verify the frontend origin is listed in `CORS_ALLOWED_ORIGINS` in `config.py`
Slow first startup	Expected — LlamaParse document processing and graph construction take time. Subsequent starts use the cached index