dow-prod-tracker/docs/archive/CLI_ANYTHING_IMPLEMENTATION_PLAN.md

9.7 KiB
Raw Blame History

CLI-Anything Integration: HP CG Production Tracker

Executive Summary

CLI-Anything is an open-source framework (MIT licensed) that automatically transforms our web application into a command-line interface. This CLI layer enables AI agents to operate the tracker through structured commands, unlocking full automation of the production pipeline — from job intake to delivery.

The goal: shift producers from operators (doing everything manually) to supervisors (approving, overriding, handling exceptions).


Key Findings

What CLI-Anything Does

  • Auto-generates a production-ready CLI from an existing codebase
  • 7-phase pipeline: analyze source → design commands → implement → test → document → publish
  • Outputs a pip-installable package with JSON output mode for AI consumption
  • Includes auto-generated tests and documentation
  • Validated across 11 major applications (1,508 passing tests)
  • Repository: https://github.com/HKUDS/CLI-Anything

Why It Fits Our Project

  • Our tracker already has a clean service layer with 26 REST API endpoints
  • Zod validators translate directly to CLI command schemas
  • Dependency engine and business rules are already codified
  • Skills, capacity, and workload data already exists in the system
  • Ollama is already running locally for embeddings (pgvector + nomic-embed-text)

Cost

  • CLI-Anything: Free (MIT license)
  • Claude API (default): ~$0.010.05 per interaction, highly reliable tool calling
  • Local LLM (Ollama): Free, already in our Docker Compose stack — serves as offline fallback
  • Primary cost is implementation time

What Can Be Automated

Producer Task Automation Level How
Job intake / project creation Fully automatable AI parses incoming requests (email, brief docs) and creates projects, deliverables, and pipeline stages
Artist assignment Fully automatable AI matches skills, capacity, and department data to assign the best available artist
Stage progression Mostly automatable Dependency engine auto-advances stages as prerequisites are approved; AI triggers downstream work
Deadline monitoring & escalation Fully automatable Scheduled agent flags overdue items, nudges artists, escalates only true blockers to producers
Status reporting Fully automatable Agent queries tracker and generates summaries on demand or on schedule
Revision cycles Partially automatable Agent logs revisions and reassigns artists; creative review still requires human eyes

Producer Role: Before vs After

Today With Automation
Create projects manually Review auto-created projects, approve
Assign artists by memory Review AI-suggested assignments, override if needed
Monitor every stage daily Get alerted only on exceptions and blockers
Chase artists for updates Agent handles nudges and follow-ups
Compile status reports Reports generated automatically
Advance stages manually Pipeline advances itself

Architecture Decision: Chat Interface

Default: Claude API + Tool Use

  • Claude interprets natural language and calls tools mapped to our services
  • Best-in-class accuracy for tool calling, multi-step operations, and ambiguous requests
  • ~$0.010.05 per interaction — negligible cost at 10-30 producers
  • Estimated monthly cost: ~$20100 depending on usage volume

Fallback: Local LLM via Ollama

  • Activates automatically if the Claude API is unreachable (outage, network issues)
  • Llama 3 70B or Qwen 2.5 72B running locally via existing Docker Compose stack
  • Handles most straightforward operations reliably
  • Ensures producers are never blocked — the chat assistant stays online regardless

Why Claude as Default

  • Tool calling accuracy is significantly higher than local models — fewer misrouted commands, fewer confirmation retries
  • Handles complex multi-step requests out of the box ("create 20 deliverables and assign them based on availability")
  • The cost is trivial relative to the producer time saved
  • Ollama remains valuable as a zero-downtime safety net, not a cost-saving measure

Implementation Steps

Phase 1: Generate the CLI

  1. Install CLI-Anything as a Claude Code plugin:
    /plugin marketplace add HKUDS/CLI-Anything
    /plugin install cli-anything
    
  2. Run CLI generation against the tracker codebase:
    /cli-anything ~/Documents/VScode/hp_prod_tracker
    
  3. Review generated commands — ensure they map correctly to existing services:
    • Project CRUD (create, list, update, archive)
    • Deliverable CRUD + bulk creation
    • Stage advancement + status updates
    • Artist assignment (with skill/capacity awareness)
    • Revision logging
    • Workload queries
    • Excel import/export
  4. Refine coverage for any missing operations:
    /cli-anything:refine ~/Documents/VScode/hp_prod_tracker "pipeline dependencies and bulk operations"
    
  5. Run the auto-generated tests and validate against the real database
  6. Install the CLI locally:
    cd hp_prod_tracker/agent-harness && pip install -e .
    

Phase 2: Build the Chat UI Component

  1. Create a slide-out chat panel using shadcn/ui (Sheet + ScrollArea + Input)
  2. Add a chat icon/button to the app sidebar or top bar
  3. Store chat history per user (new Prisma model or simple local state for V1)
  4. Pass current context (active project, deliverable) into the chat so producers don't have to specify everything

Phase 3: Wire the AI Backend (Claude API — Default)

  1. Install Anthropic SDK: npm install @anthropic-ai/sdk
  2. Create /api/chat route in the Next.js app
  3. Configure Claude API client with API key (stored in environment variables)
  4. Define tools from existing Zod validators and service functions:
    • create_project, list_projects, update_project
    • create_deliverable, list_deliverables
    • assign_artist, remove_assignment
    • advance_stage, get_blocked_stages
    • get_workload, get_available_artists
    • create_revision, list_overdue
    • export_excel, import_excel
  5. Implement tool execution handlers that call the existing service layer
  6. Add confirmation flow: for any mutation, show the user what will happen before executing
  7. After execution, invalidate relevant TanStack Query caches so the UI updates in real-time
  8. Test against common producer requests:
    • "Create a new project for Pavilion 16, high priority, Q3"
    • "Assign Maria to Model Prep on Spectre x360"
    • "What's overdue this week?"
    • "Mark all Catalog Images for HP-2026-Q2 as delivered"

Phase 4: Ollama Fallback Layer

  1. Add a chat-capable Ollama model to docker-compose.yml (e.g., llama3:70b or qwen2.5:72b)
  2. Create a provider abstraction in the chat API route:
    try Claude API → on connection failure → fall back to Ollama
    
  3. Map the same tool definitions to Ollama's function calling format
  4. Add a health check endpoint that monitors Claude API availability
  5. Log all fallback events so we can track how often Ollama is needed
  6. Ensure producers see a subtle indicator when running in fallback mode (e.g., "Running locally — some complex requests may need to be simplified")

Phase 5: Automation Agents (Scheduled)

  1. Create scheduled agent scripts (cron or Next.js API routes triggered by cron):
    • Deadline monitor: Runs daily, flags overdue stages, sends notifications
    • Auto-assignment: When a stage unblocks, suggests or auto-assigns based on skills + capacity
    • Stage auto-advance: When all prerequisites are approved, automatically transition downstream stages
    • Status digest: Weekly summary per project emailed or posted to Slack
  2. Each agent uses the CLI or service layer directly
  3. Add producer override controls — ability to pause/resume automation per project

Phase 6: Full Pipeline Automation

  1. Intake automation: Monitor email inbox or Workfront API for new requests → auto-create projects
  2. Smart assignment: Factor in historical performance, current workload trends, and skill match scores
  3. Predictive alerts: Flag projects likely to miss deadlines before they're actually late
  4. Self-healing pipeline: If an artist hasn't started an assigned stage in X days, auto-reassign

Risk Considerations

Risk Mitigation
Claude API outage Automatic fallback to local Ollama model; producers are never blocked
Claude API costs spike unexpectedly Monitor usage via Anthropic dashboard; set billing alerts; ~$20-100/mo expected for 10-30 users
Ollama fallback misinterprets a command Confirmation step before all mutations; undo capability; subtle UI indicator when in fallback mode
Producers don't trust the AI Start with read-only queries (status checks, reports), add mutations gradually
Wrong artist assigned automatically Always surface assignments as suggestions first; let producers approve for first 2-4 weeks
Over-automation removes producer oversight Keep producers in the loop via notifications; require approval for high-impact actions (project creation, bulk operations)

Success Metrics

  • Reduction in time producers spend on manual tracker operations (target: 70%+)
  • Accuracy of AI-driven assignments vs producer overrides (target: 85%+ acceptance rate)
  • Producer adoption of chat interface (target: daily use within 4 weeks)
  • Reduction in overdue stages (target: 30%+ improvement from proactive monitoring)

References