diff --git a/CLI_ANYTHING_IMPLEMENTATION_PLAN.md b/CLI_ANYTHING_IMPLEMENTATION_PLAN.md new file mode 100644 index 0000000..b47a2a7 --- /dev/null +++ b/CLI_ANYTHING_IMPLEMENTATION_PLAN.md @@ -0,0 +1,207 @@ +# CLI-Anything Integration: HP CG Production Tracker + +## Executive Summary + +CLI-Anything is an open-source framework (MIT licensed) that automatically transforms our web application into a command-line interface. This CLI layer enables AI agents to operate the tracker through structured commands, unlocking full automation of the production pipeline — from job intake to delivery. + +The goal: shift producers from **operators** (doing everything manually) to **supervisors** (approving, overriding, handling exceptions). + +--- + +## Key Findings + +### What CLI-Anything Does + +- Auto-generates a production-ready CLI from an existing codebase +- 7-phase pipeline: analyze source → design commands → implement → test → document → publish +- Outputs a pip-installable package with JSON output mode for AI consumption +- Includes auto-generated tests and documentation +- Validated across 11 major applications (1,508 passing tests) +- Repository: https://github.com/HKUDS/CLI-Anything + +### Why It Fits Our Project + +- Our tracker already has a clean service layer with 26 REST API endpoints +- Zod validators translate directly to CLI command schemas +- Dependency engine and business rules are already codified +- Skills, capacity, and workload data already exists in the system +- Ollama is already running locally for embeddings (pgvector + nomic-embed-text) + +### Cost + +- CLI-Anything: Free (MIT license) +- Claude API (default): ~$0.01–0.05 per interaction, highly reliable tool calling +- Local LLM (Ollama): Free, already in our Docker Compose stack — serves as offline fallback +- Primary cost is implementation time + +--- + +## What Can Be Automated + +| Producer Task | Automation Level | How | +|---|---|---| +| Job intake / project creation | Fully automatable | AI parses incoming requests (email, brief docs) and creates projects, deliverables, and pipeline stages | +| Artist assignment | Fully automatable | AI matches skills, capacity, and department data to assign the best available artist | +| Stage progression | Mostly automatable | Dependency engine auto-advances stages as prerequisites are approved; AI triggers downstream work | +| Deadline monitoring & escalation | Fully automatable | Scheduled agent flags overdue items, nudges artists, escalates only true blockers to producers | +| Status reporting | Fully automatable | Agent queries tracker and generates summaries on demand or on schedule | +| Revision cycles | Partially automatable | Agent logs revisions and reassigns artists; creative review still requires human eyes | + +### Producer Role: Before vs After + +| Today | With Automation | +|---|---| +| Create projects manually | Review auto-created projects, approve | +| Assign artists by memory | Review AI-suggested assignments, override if needed | +| Monitor every stage daily | Get alerted only on exceptions and blockers | +| Chase artists for updates | Agent handles nudges and follow-ups | +| Compile status reports | Reports generated automatically | +| Advance stages manually | Pipeline advances itself | + +--- + +## Architecture Decision: Chat Interface + +### Default: Claude API + Tool Use + +- Claude interprets natural language and calls tools mapped to our services +- Best-in-class accuracy for tool calling, multi-step operations, and ambiguous requests +- ~$0.01–0.05 per interaction — negligible cost at 10-30 producers +- Estimated monthly cost: ~$20–100 depending on usage volume + +### Fallback: Local LLM via Ollama + +- Activates automatically if the Claude API is unreachable (outage, network issues) +- Llama 3 70B or Qwen 2.5 72B running locally via existing Docker Compose stack +- Handles most straightforward operations reliably +- Ensures producers are never blocked — the chat assistant stays online regardless + +### Why Claude as Default + +- Tool calling accuracy is significantly higher than local models — fewer misrouted commands, fewer confirmation retries +- Handles complex multi-step requests out of the box ("create 20 deliverables and assign them based on availability") +- The cost is trivial relative to the producer time saved +- Ollama remains valuable as a zero-downtime safety net, not a cost-saving measure + +--- + +## Implementation Steps + +### Phase 1: Generate the CLI + +1. Install CLI-Anything as a Claude Code plugin: + ```bash + /plugin marketplace add HKUDS/CLI-Anything + /plugin install cli-anything + ``` +2. Run CLI generation against the tracker codebase: + ```bash + /cli-anything ~/Documents/VScode/hp_prod_tracker + ``` +3. Review generated commands — ensure they map correctly to existing services: + - Project CRUD (create, list, update, archive) + - Deliverable CRUD + bulk creation + - Stage advancement + status updates + - Artist assignment (with skill/capacity awareness) + - Revision logging + - Workload queries + - Excel import/export +4. Refine coverage for any missing operations: + ```bash + /cli-anything:refine ~/Documents/VScode/hp_prod_tracker "pipeline dependencies and bulk operations" + ``` +5. Run the auto-generated tests and validate against the real database +6. Install the CLI locally: + ```bash + cd hp_prod_tracker/agent-harness && pip install -e . + ``` + +### Phase 2: Build the Chat UI Component + +1. Create a slide-out chat panel using shadcn/ui (`Sheet` + `ScrollArea` + `Input`) +2. Add a chat icon/button to the app sidebar or top bar +3. Store chat history per user (new Prisma model or simple local state for V1) +4. Pass current context (active project, deliverable) into the chat so producers don't have to specify everything + +### Phase 3: Wire the AI Backend (Claude API — Default) + +1. Install Anthropic SDK: `npm install @anthropic-ai/sdk` +2. Create `/api/chat` route in the Next.js app +3. Configure Claude API client with API key (stored in environment variables) +4. Define tools from existing Zod validators and service functions: + - `create_project`, `list_projects`, `update_project` + - `create_deliverable`, `list_deliverables` + - `assign_artist`, `remove_assignment` + - `advance_stage`, `get_blocked_stages` + - `get_workload`, `get_available_artists` + - `create_revision`, `list_overdue` + - `export_excel`, `import_excel` +5. Implement tool execution handlers that call the existing service layer +6. Add confirmation flow: for any mutation, show the user what will happen before executing +7. After execution, invalidate relevant TanStack Query caches so the UI updates in real-time +8. Test against common producer requests: + - "Create a new project for Pavilion 16, high priority, Q3" + - "Assign Maria to Model Prep on Spectre x360" + - "What's overdue this week?" + - "Mark all Catalog Images for HP-2026-Q2 as delivered" + +### Phase 4: Ollama Fallback Layer + +1. Add a chat-capable Ollama model to `docker-compose.yml` (e.g., `llama3:70b` or `qwen2.5:72b`) +2. Create a provider abstraction in the chat API route: + ``` + try Claude API → on connection failure → fall back to Ollama + ``` +3. Map the same tool definitions to Ollama's function calling format +4. Add a health check endpoint that monitors Claude API availability +5. Log all fallback events so we can track how often Ollama is needed +6. Ensure producers see a subtle indicator when running in fallback mode (e.g., "Running locally — some complex requests may need to be simplified") + +### Phase 5: Automation Agents (Scheduled) + +1. Create scheduled agent scripts (cron or Next.js API routes triggered by cron): + - **Deadline monitor**: Runs daily, flags overdue stages, sends notifications + - **Auto-assignment**: When a stage unblocks, suggests or auto-assigns based on skills + capacity + - **Stage auto-advance**: When all prerequisites are approved, automatically transition downstream stages + - **Status digest**: Weekly summary per project emailed or posted to Slack +2. Each agent uses the CLI or service layer directly +3. Add producer override controls — ability to pause/resume automation per project + +### Phase 6: Full Pipeline Automation + +1. **Intake automation**: Monitor email inbox or Workfront API for new requests → auto-create projects +2. **Smart assignment**: Factor in historical performance, current workload trends, and skill match scores +3. **Predictive alerts**: Flag projects likely to miss deadlines before they're actually late +4. **Self-healing pipeline**: If an artist hasn't started an assigned stage in X days, auto-reassign + +--- + +## Risk Considerations + +| Risk | Mitigation | +|---|---| +| Claude API outage | Automatic fallback to local Ollama model; producers are never blocked | +| Claude API costs spike unexpectedly | Monitor usage via Anthropic dashboard; set billing alerts; ~$20-100/mo expected for 10-30 users | +| Ollama fallback misinterprets a command | Confirmation step before all mutations; undo capability; subtle UI indicator when in fallback mode | +| Producers don't trust the AI | Start with read-only queries (status checks, reports), add mutations gradually | +| Wrong artist assigned automatically | Always surface assignments as suggestions first; let producers approve for first 2-4 weeks | +| Over-automation removes producer oversight | Keep producers in the loop via notifications; require approval for high-impact actions (project creation, bulk operations) | + +--- + +## Success Metrics + +- Reduction in time producers spend on manual tracker operations (target: 70%+) +- Accuracy of AI-driven assignments vs producer overrides (target: 85%+ acceptance rate) +- Producer adoption of chat interface (target: daily use within 4 weeks) +- Reduction in overdue stages (target: 30%+ improvement from proactive monitoring) + +--- + +## References + +- CLI-Anything: https://github.com/HKUDS/CLI-Anything +- Claude API Tool Use: https://docs.anthropic.com/en/docs/build-with-claude/tool-use +- Project codebase: ~/Documents/VScode/hp_prod_tracker +- Existing implementation plan: ~/Documents/VScode/hp_prod_tracker/IMPLEMENTATION_PLAN.md +- Upgrade roadmap: ~/Documents/VScode/hp_prod_tracker/UPGRADE_PLAN.md