feat(pinecone): add research document assessing relevance of Pinecone for HP Prod Tracker

This commit is contained in:
Leivur Djurhuus 2026-03-06 15:25:52 -06:00
parent 1c268e725a
commit ed079ffbe1
2 changed files with 457 additions and 14 deletions

View file

@ -15,10 +15,11 @@
5. [Phase 9: Advanced Reporting & Analytics](#phase-9-advanced-reporting--analytics)
6. [Phase 10: Collaboration Enhancements](#phase-10-collaboration-enhancements)
7. [Phase 11: Quality of Life & Polish](#phase-11-quality-of-life--polish)
8. [Data Model Changes Summary](#data-model-changes-summary)
9. [New API Routes Summary](#new-api-routes-summary)
10. [New Pages Summary](#new-pages-summary)
11. [Third-Party Libraries](#third-party-libraries)
8. [Phase 12: Docker Deployment](#phase-12-docker-deployment)
9. [Data Model Changes Summary](#data-model-changes-summary)
10. [New API Routes Summary](#new-api-routes-summary)
11. [New Pages Summary](#new-pages-summary)
12. [Third-Party Libraries](#third-party-libraries)
---
@ -903,6 +904,123 @@ the integration layer — the engine itself is developed independently.
---
### 8.4 — AI-Powered Natural Language Search (pgvector)
**What:** A chat-style search panel where producers can ask questions in plain English
and get back relevant projects, deliverables, and pipeline stages with direct links.
For example: *"Which Envy projects are running behind?"* or *"Show me deliverables
similar to the Q3 packaging work."*
**Why:** As the tracker grows to hundreds of projects and thousands of deliverables,
finding the right information becomes harder. Traditional filters work for structured
queries (status = overdue), but producers often think in terms of meaning and context.
Natural language search bridges that gap without requiring producers to learn complex
filter combinations.
**Approach:** Use PostgreSQL's `pgvector` extension to add vector search directly to
our existing database — no external vector database service needed. Use **Ollama** to
run embedding and summarization models locally — zero API costs, no data leaves the
network, and no dependency on external AI services. This keeps the architecture simple,
self-contained, and free to operate.
**Implementation:**
1. **Database setup**
- Enable `pgvector` extension on PostgreSQL (`CREATE EXTENSION vector`)
- Add raw SQL migration for embedding columns (Prisma doesn't natively support
vector types — use `Unsupported("vector(768)")` in schema, raw SQL for queries)
- Add `embedding Vector(768)` column to `projects`, `deliverables`, and
`deliverable_stages` tables (768 dimensions for `nomic-embed-text` model)
2. **Embedding generation service**
- On create/update of a project or deliverable, generate a text representation by
concatenating key fields: name, description, status, priority, assignees,
deliverable names, notes, business unit, code name, etc.
- Call the local Ollama API (`POST http://ollama:11434/api/embeddings`) using the
`nomic-embed-text` model to convert that text into a 768-dimensional vector
- Store the vector in the embedding column
- One-time backfill script to generate embeddings for all existing records
- Service layer hook to regenerate embeddings when records change
3. **Search API**
- New endpoint: `/api/search/semantic/`
- Accepts a natural language query string
- Converts the query to an embedding using the same model
- Runs cosine similarity search via pgvector:
`SELECT *, embedding <=> $1 AS distance FROM projects ORDER BY distance LIMIT 10`
- Hybrid routing: detect structural queries (dates, statuses, priorities) and route
to standard Prisma filters; route meaning-based queries to vector search
- Results include entity type, ID, name, status, and relevance score
4. **LLM summarization layer (optional enhancement)**
- Pass the top search results + the user's original question to a local Ollama LLM
(`POST http://ollama:11434/api/generate`) using `llama3.1:8b` or `mistral`
- Generate a natural language summary: *"There are 3 Envy projects currently behind
schedule. The most critical is Envy 16 Refresh with 4 overdue deliverables..."*
- Return both the AI summary and the structured result list
- Runs entirely on-premises — no project data ever leaves the network
5. **Frontend: Producer Search Chat**
- Extend the existing `cmdk` command palette with a "smart search" mode, or add a
dedicated slide-out chat panel accessible from the top nav
- Input: free-text query field
- Output: AI summary (if enabled) at the top, followed by clickable result cards
for matching projects/deliverables that link directly into the tracker
- Show relevance scores and highlight why each result matched
- Conversation history within the session for follow-up questions
**Data model additions:**
```prisma
// Add to existing models (raw SQL migration — Prisma Unsupported type)
// projects table: embedding Unsupported("vector(768)")?
// deliverables table: embedding Unsupported("vector(768)")?
model SearchLog {
id String @id @default(cuid())
userId String
user User @relation(fields: [userId], references: [id])
query String
resultCount Int
clickedId String? // which result the user opened (for relevance tuning)
createdAt DateTime @default(now())
@@index([userId])
@@map("search_logs")
}
```
**Key files:**
- `src/lib/services/embedding-service.ts` — Generate and store embeddings
- `src/lib/services/semantic-search-service.ts` — Vector search + hybrid routing
- `src/app/api/search/semantic/route.ts` — Search API endpoint
- `src/components/search/smart-search-panel.tsx` — Chat-style search UI
- `src/hooks/use-semantic-search.ts` — React Query hook for search
- `prisma/migrations/xxx_add_pgvector.sql` — Raw SQL migration for pgvector setup
- `scripts/backfill-embeddings.ts` — One-time backfill script
**New dependencies:**
- `pgvector` PostgreSQL extension (installed on the database, not an npm package)
- Ollama service (Docker container — `ollama/ollama` image)
- Ollama models: `nomic-embed-text` (embeddings, ~274MB), `llama3.1:8b` (summarization,
~4.7GB) — pulled automatically on first container start
- No paid API services — everything runs locally
**Practical notes:**
- Zero ongoing AI costs — all models run on-premises via Ollama
- No project data ever leaves the network — important for HP production data
- Ollama exposes a simple REST API (`http://ollama:11434`) — the embedding service
just makes HTTP calls, no SDK needed
- Embeddings are fast even on CPU (~10-50ms per record); summarization benefits from
GPU but works on CPU with a few extra seconds per query
- If vector search needs ever outgrow pgvector's performance at scale, migration to a
dedicated vector database like Pinecone is straightforward — the embedding generation
and search API layers stay the same, only the storage backend changes
- Search logs enable future relevance tuning and usage analytics
- See [Phase 12: Docker Deployment](#phase-12-docker-deployment) for the full
containerized deployment strategy including Ollama
---
## Phase 9: Advanced Reporting & Analytics
Builds on the existing dashboard with deeper insights for project management and
@ -1236,6 +1354,193 @@ file naming convention (e.g., `SKU-12345_catalog_v2.png` matches Catalog Images
---
## Phase 12: Docker Deployment
Containerize the entire application stack for consistent, one-command deployment to any
server. Eliminates "works on my machine" issues, simplifies onboarding, and makes the
Ollama AI layer a natural part of the infrastructure rather than a separate install.
### 12.1 — Docker Compose Stack
**What:** A `docker-compose.yml` that defines the complete application as three services:
the Next.js app, PostgreSQL with pgvector, and Ollama with pre-configured models. One
`docker compose up` starts everything.
**Architecture:**
```
┌─────────────────────────────────────────────────────┐
│ docker-compose.yml │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────────┐ │
│ │ app │ │ db │ │ ollama │ │
│ │ Next.js │ │ PostgreSQL │ │ nomic-embed │ │
│ │ Port 3000 │ │ + pgvector│ │ llama3.1:8b │ │
│ │ │──│ Port 5432 │ │ Port 11434 │ │
│ │ │ │ │ │ │ │
│ └────────────┘ └────────────┘ └────────────────┘ │
│ │ │ │ │
│ [app-network] [db-volume] [ollama-volume] │
└─────────────────────────────────────────────────────┘
```
**Services:**
| Service | Image | Purpose |
|---------|-------|---------|
| `app` | Custom (Dockerfile) | Next.js production build, serves the tracker |
| `db` | `pgvector/pgvector:pg17` | PostgreSQL 17 with pgvector extension pre-installed |
| `ollama` | `ollama/ollama:latest` | Local AI model server for embeddings and summarization |
**Implementation:**
1. **`Dockerfile`** (Next.js app)
- Multi-stage build: `node:20-alpine` for deps + build, minimal final image
- Stage 1: Install dependencies (`npm ci`)
- Stage 2: Build the Next.js app (`npm run build`)
- Stage 3: Production image with only `next start` and built output
- Runs `prisma generate` during build, `prisma migrate deploy` on startup
- Final image size target: ~200-300MB
2. **`docker-compose.yml`**
- Three services (`app`, `db`, `ollama`) on a shared internal network
- `app` depends on `db` and `ollama` with health checks
- `db` uses `pgvector/pgvector:pg17` image with pgvector ready out of the box
- `ollama` uses official image with a startup script to pull models on first run
- Named volumes for database data (`pgdata`) and Ollama models (`ollama-models`)
- Environment variables sourced from `.env` file
- Only `app` exposes a port to the host (3000); `db` and `ollama` are internal only
3. **`docker/ollama-entrypoint.sh`** (model bootstrap script)
- Starts the Ollama server
- Checks if required models are already pulled (cached in volume)
- If not, pulls `nomic-embed-text` and `llama3.1:8b` automatically
- Subsequent starts skip the pull — models persist in the Docker volume
4. **`docker/db-init.sql`** (database initialization)
- `CREATE EXTENSION IF NOT EXISTS vector;` — ensures pgvector is enabled
- Runs automatically on first database creation via PostgreSQL's init script mechanism
5. **`.env.example`** (deployment template)
```env
# Database
DATABASE_URL=postgresql://postgres:your_password@db:5432/hp_prod_tracker
POSTGRES_PASSWORD=your_password
POSTGRES_DB=hp_prod_tracker
# NextAuth
NEXTAUTH_URL=http://your-server:3000
NEXTAUTH_SECRET=generate-a-random-secret
# Ollama (internal — no need to change)
OLLAMA_HOST=http://ollama:11434
OLLAMA_EMBED_MODEL=nomic-embed-text
OLLAMA_LLM_MODEL=llama3.1:8b
```
**Key files:**
- `Dockerfile` — Multi-stage Next.js production build
- `docker-compose.yml` — Full stack orchestration
- `docker/ollama-entrypoint.sh` — Model bootstrap script
- `docker/db-init.sql` — pgvector extension initialization
- `.env.example` — Environment variable template with documentation
- `.dockerignore` — Exclude node_modules, .next, .git, etc.
---
### 12.2 — Health Checks & Startup Orchestration
**What:** Ensure services start in the correct order and the app only accepts traffic
once all dependencies are healthy.
**Implementation:**
- `db` health check: `pg_isready` command — app waits until database accepts connections
- `ollama` health check: `curl http://localhost:11434/api/tags` — confirms Ollama is
running and responsive
- `app` startup script: runs `prisma migrate deploy` first (applies any pending
migrations), then starts Next.js
- Docker Compose `depends_on` with `condition: service_healthy` ensures correct order:
db starts first, then ollama, then app
- Restart policy: `restart: unless-stopped` on all services for automatic recovery
---
### 12.3 — Production Deployment Workflow
**What:** Documented step-by-step process for deploying to a server.
**Deployment steps:**
```bash
# 1. Clone the repository
git clone <repo-url> hp-prod-tracker
cd hp-prod-tracker
# 2. Configure environment
cp .env.example .env
# Edit .env with production values (database password, NextAuth secret, server URL)
# 3. Start everything
docker compose up -d
# 4. First run: wait for Ollama to download models (~5GB, one-time)
docker compose logs -f ollama # Watch progress, Ctrl+C when done
# 5. Seed the database (if fresh install)
docker compose exec app npx prisma db seed
# 6. Verify
curl http://localhost:3000 # Should return the app
```
**Updating the application:**
```bash
git pull
docker compose up -d --build # Rebuilds only the app container
# Prisma migrations run automatically on startup
```
**GPU support for Ollama (optional):**
- Install `nvidia-container-toolkit` on the host
- Add `deploy.resources.reservations.devices` to the ollama service in compose
- Significantly speeds up LLM summarization; embeddings are fast regardless
- CPU-only is fully functional — GPU is a performance optimization, not a requirement
**Backup strategy:**
- Database: `docker compose exec db pg_dump -U postgres hp_prod_tracker > backup.sql`
- Ollama models: cached in volume, re-pulled automatically if lost — no backup needed
- Application: stateless — the Docker image is rebuilt from source on each deploy
---
### 12.4 — Development Environment with Docker
**What:** A `docker-compose.dev.yml` override for local development that mounts source
code and enables hot reloading while keeping the database and Ollama in containers.
**Implementation:**
- Override file extends the production compose with dev-specific settings
- `app` service: mounts `./src` as a volume, runs `next dev` instead of `next start`
- `db` service: exposes port 5432 to host for Prisma Studio / direct access
- `ollama` service: same as production (models don't need hot reload)
- Developers can choose: run everything in Docker, or run only `db` + `ollama` in
Docker and run the Next.js app natively with `npm run dev`
**Usage:**
```bash
# Full Docker development
docker compose -f docker-compose.yml -f docker-compose.dev.yml up
# Or: only infrastructure in Docker, app runs natively
docker compose up db ollama
npm run dev
```
**Key files:**
- `docker-compose.dev.yml` — Development overrides
- `docker/dev-entrypoint.sh` — Dev startup script (skip build, run dev server)
---
## Data Model Changes Summary
| Phase | New Models | Modified Models |
@ -1243,12 +1548,12 @@ file naming convention (e.g., `SKU-12345_catalog_v2.png` matches Catalog Images
| 5 | Annotation, ReviewSession, ReviewSessionItem, FeedbackItem | Comment (add annotations + feedback relations) |
| 6 | Skill, UserSkill, StageSkillRequirement | User (add maxCapacity, skills) |
| 7 | AutomationRule, AutomationExecution, ApprovalChain, ApprovalStep, ApprovalRecord, ProjectTemplate, ProjectTemplateDeliverable | — |
| 8 | AssetSpec, AssetValidationResult, AIReviewResult | Revision (add validation/AI relations) |
| 8 | AssetSpec, AssetValidationResult, AIReviewResult, SearchLog | Revision (add validation/AI relations), Project + Deliverable (add embedding columns) |
| 9 | PortalLink, SLATarget | — |
| 10 | ActivityEntry | — |
| 11 | SavedView | — |
**Total new models: 21**
**Total new models: 22**
---
@ -1259,12 +1564,12 @@ file naming convention (e.g., `SKU-12345_catalog_v2.png` matches Catalog Images
| 5 | `/api/annotations/`, `/api/reviews/`, `/api/reviews/[id]/items/`, `/api/feedback/`, `/api/feedback/[id]/`, `/api/stages/[id]/feedback/` |
| 6 | `/api/workload/`, `/api/skills/`, `/api/users/[id]/skills/` |
| 7 | `/api/automations/`, `/api/automations/[id]/executions/`, `/api/approval-chains/`, `/api/stages/[id]/approve/`, `/api/templates/`, `/api/templates/[id]/instantiate/` |
| 8 | `/api/asset-specs/`, `/api/revisions/[id]/validate/`, `/api/webhooks/ai-review/`, `/api/revisions/[id]/ai-review/` |
| 8 | `/api/asset-specs/`, `/api/revisions/[id]/validate/`, `/api/webhooks/ai-review/`, `/api/revisions/[id]/ai-review/`, `/api/search/semantic/` |
| 9 | `/api/portal/`, `/api/portal/[token]/`, `/api/analytics/velocity/`, `/api/analytics/sla/` |
| 10 | `/api/projects/[id]/activity/`, `/api/external-links/` |
| 11 | `/api/views/` |
**Total new API routes: ~25**
**Total new API routes: ~26**
---
@ -1275,12 +1580,12 @@ file naming convention (e.g., `SKU-12345_catalog_v2.png` matches Catalog Images
| 5 | Review page (per deliverable), Review sessions list, Session presenter |
| 6 | Workload/capacity page, Skills management (settings) |
| 7 | Automations management (settings), Approval chains (settings), Template library |
| 8 | Asset specs (settings) |
| 8 | Asset specs (settings), Smart search panel (chat UI) |
| 9 | Client portal (external), SLA configuration (settings) |
| 10 | Activity feed (per project), External review page |
| 11 | — (enhancements to existing pages) |
**Total new pages: ~12**
**Total new pages: ~13**
---
@ -1326,11 +1631,19 @@ Phase 8 (Asset Intelligence) ─── requires Phase 5 for full value
|
|-- 8.1 File Validation <-- standalone
|-- 8.2 Preview Generation <-- standalone
+-- 8.3 AI Integration <-- requires external engine + 8.1 + 8.2
|-- 8.3 AI Integration <-- requires external engine + 8.1 + 8.2
+-- 8.4 Semantic Search <-- standalone, requires pgvector extension
Phase 9 (Reporting) ─── benefits from Phase 6 + 7 data
Phase 10 (Collaboration) ─── benefits from Phase 5
Phase 11 (QoL) ─── standalone incremental improvements, can be interleaved
Phase 12 (Docker) ─── can be done at any time, benefits from 8.4 for Ollama
|
|-- 12.1 Docker Compose Stack <-- foundation
|-- 12.2 Health Checks <-- requires 12.1
|-- 12.3 Production Workflow <-- requires 12.1 + 12.2
+-- 12.4 Dev Environment <-- requires 12.1
```
---
@ -1342,13 +1655,15 @@ Phase 11 (QoL) ─── standalone incremental improvements, can be interleaved
| 5 | 5 | 6 | 3 | ~32 |
| 6 | 3 | 3 | 2 | ~8 |
| 7 | 6 | 6 | 3 | ~10 |
| 8 | 3 | 4 | 1 | ~8 |
| 8 | 4 | 5 | 2 | ~13 |
| 9 | 2 | 4 | 2 | ~8 |
| 10 | 1 | 2 | 2 | ~6 |
| 11 | 1 | 1 | 0 | ~8 |
| **Total** | **21** | **~26** | **~13** | **~80** |
| 12 | 0 | 0 | 0 | 0 (infra only) |
| **Total** | **22** | **~27** | **~14** | **~85** |
---
*Document version: 1.0 — Created 2026-03-01*
*Document version: 1.1 — Created 2026-03-01, updated 2026-03-06*
*Updates: Added 8.4 (AI semantic search with Ollama + pgvector), Phase 12 (Docker deployment)*
*To be updated as features are refined and priorities shift.*

128
pinecone-research.md Normal file
View file

@ -0,0 +1,128 @@
# Pinecone Research — Is It Relevant for HP Prod Tracker?
**Date:** March 2026
**Prepared for:** Internal review
---
## What Is Pinecone?
Pinecone is a fully managed **vector database** designed for AI-powered applications. Instead of storing and querying data using traditional rows, columns, and SQL filters, Pinecone stores **vectors** — numerical representations of text, images, or other data — and lets you search by **meaning** rather than exact keywords.
For example, a search for "running shoes" in a traditional database only returns results that literally contain "running shoes." In Pinecone, a search for "running shoes" could also surface "jogging sneakers" or "athletic footwear" because the system understands they mean similar things.
Pinecone is primarily used to power:
- **Semantic search** — find things by meaning, not just keywords
- **Retrieval-Augmented Generation (RAG)** — feed relevant company data into AI chatbots (like ChatGPT) so they give accurate, context-aware answers
- **Recommendation engines** — "items similar to this one"
- **AI assistants and knowledge bases** — let employees ask questions in natural language and get answers from internal documents
---
## How It Works (In Simple Terms)
1. You take your data (documents, product descriptions, notes, etc.)
2. An AI model converts each piece of data into a vector (a list of numbers that captures its meaning)
3. Those vectors are stored in Pinecone
4. When someone searches, their query is also converted into a vector
5. Pinecone finds the stored vectors that are closest in meaning and returns them
Pinecone handles step 3-5 and can even handle step 2 with its built-in embedding models (like `llama-text-embed-v2`), so you don't always need a separate AI service to generate vectors.
---
## Key Features
| Feature | Details |
|---|---|
| **Serverless architecture** | No servers to manage. Scales up and down automatically based on usage. |
| **Cloud support** | Available on AWS, GCP, and Azure |
| **Built-in embeddings** | Can automatically convert text to vectors without a separate embedding service |
| **Hybrid search** | Combines semantic (meaning-based) and keyword search for better results |
| **Metadata filtering** | Filter results by category, date, status, etc. alongside semantic search |
| **Multi-tenancy** | Namespaces let you isolate data per team, customer, or project |
| **Integrated with major AI tools** | Works with OpenAI, Cohere, LangChain, Amazon Bedrock, and many others |
| **SDKs** | Official clients for Python, JavaScript/TypeScript, Java, Go, and C# |
| **Canopy (RAG framework)** | Open-source RAG framework built on Pinecone for quick chatbot prototyping |
---
## Pricing Overview
Pinecone operates on a **pay-as-you-go** model for its serverless tier:
| Tier | What You Get |
|---|---|
| **Free (Starter)** | One serverless index, enough for prototyping and small projects. No credit card required. |
| **Standard** | Production-ready with higher limits, usage-based billing. Suitable for most teams. |
| **Enterprise** | Custom pricing, dedicated support, SSO, advanced security, SLAs. |
Costs are based on the amount of data stored, the number of queries, and the compute used. For small-to-medium workloads, costs are generally low. The free tier is sufficient to evaluate whether Pinecone fits a use case.
---
## Our Project: HP Prod Tracker
Our application is a **production pipeline tracker** built with:
- **Next.js** (React) frontend
- **PostgreSQL** database via **Prisma ORM**
- Features: project management, deliverable tracking, multi-stage production pipelines, revision workflows, assignments, notifications, workload/capacity management
The core data model is **structured and relational**: projects have deliverables, deliverables have pipeline stages, stages have assignments and revisions. Users filter by status, priority, dates, and assignees. This is classic relational database territory — and PostgreSQL handles it very well.
---
## Relevance Assessment: Does Pinecone Make Sense for Us?
### Where Pinecone Would NOT Help (Our Current Needs)
Most of what our tracker does today is **structured data management**:
- Filtering projects by status, priority, date, assignee
- Tracking pipeline stages and their statuses
- Managing assignments and revisions
- Gantt charts and timeline views
- Workload and capacity tracking
These are all **exact-match, filter, and sort operations** — exactly what PostgreSQL is built for. Pinecone would not replace or improve any of this.
### Where Pinecone COULD Help (Future Features)
Pinecone becomes relevant if we ever want to add **AI-powered features** such as:
| Potential Feature | How Pinecone Would Help |
|---|---|
| **Smart search across projects** | "Find deliverables similar to the packaging we did for the Envy line last year" — semantic search across project names, descriptions, and notes |
| **AI assistant / chatbot** | Let producers ask questions like "What's the status of all urgent items due this week?" in natural language, using RAG to pull answers from our data |
| **Similar project recommendations** | When creating a new project, suggest similar past projects as templates or references |
| **Knowledge base search** | If we store process documents, guidelines, or brand standards, Pinecone could power a "search the wiki" feature |
| **Intelligent auto-assignment** | Match deliverable requirements to team member skills and past work using vector similarity |
### Alternatives to Consider
Before committing to Pinecone, it's worth noting:
- **PostgreSQL pgvector extension** — adds vector search directly to our existing database. Simpler to set up, no extra service, good enough for moderate-scale vector search. This would be the lowest-friction option if we want to experiment.
- **Supabase Vector** — if we ever move to Supabase, it includes pgvector built-in.
- **Elasticsearch / OpenSearch** — better for full-text search; can be extended with vector capabilities.
---
## Bottom Line
**Pinecone is not relevant to our current needs.** Our production tracker is a structured data application, and PostgreSQL handles everything we need today.
**However**, if we plan to add AI-powered features in the future (smart search, chatbot, recommendations), Pinecone is one of the top choices for that. For a first step, **pgvector** (a PostgreSQL extension) would let us experiment with vector search without adding a new service to our stack.
**Recommendation:** No action needed now. Revisit if AI-powered search or a chatbot feature enters the roadmap. Start with pgvector for prototyping; consider Pinecone if we outgrow it or need production-grade vector search at scale.
---
## Useful Links
- Pinecone website: pinecone.io
- Pinecone documentation: docs.pinecone.io
- pgvector (PostgreSQL extension): github.com/pgvector/pgvector
- Pinecone JavaScript SDK: npmjs.com/package/@pinecone-database/pinecone