9.7 KiB
9.7 KiB
CLI-Anything Integration: HP CG Production Tracker
Executive Summary
CLI-Anything is an open-source framework (MIT licensed) that automatically transforms our web application into a command-line interface. This CLI layer enables AI agents to operate the tracker through structured commands, unlocking full automation of the production pipeline — from job intake to delivery.
The goal: shift producers from operators (doing everything manually) to supervisors (approving, overriding, handling exceptions).
Key Findings
What CLI-Anything Does
- Auto-generates a production-ready CLI from an existing codebase
- 7-phase pipeline: analyze source → design commands → implement → test → document → publish
- Outputs a pip-installable package with JSON output mode for AI consumption
- Includes auto-generated tests and documentation
- Validated across 11 major applications (1,508 passing tests)
- Repository: https://github.com/HKUDS/CLI-Anything
Why It Fits Our Project
- Our tracker already has a clean service layer with 26 REST API endpoints
- Zod validators translate directly to CLI command schemas
- Dependency engine and business rules are already codified
- Skills, capacity, and workload data already exists in the system
- Ollama is already running locally for embeddings (pgvector + nomic-embed-text)
Cost
- CLI-Anything: Free (MIT license)
- Claude API (default): ~$0.01–0.05 per interaction, highly reliable tool calling
- Local LLM (Ollama): Free, already in our Docker Compose stack — serves as offline fallback
- Primary cost is implementation time
What Can Be Automated
| Producer Task | Automation Level | How |
|---|---|---|
| Job intake / project creation | Fully automatable | AI parses incoming requests (email, brief docs) and creates projects, deliverables, and pipeline stages |
| Artist assignment | Fully automatable | AI matches skills, capacity, and department data to assign the best available artist |
| Stage progression | Mostly automatable | Dependency engine auto-advances stages as prerequisites are approved; AI triggers downstream work |
| Deadline monitoring & escalation | Fully automatable | Scheduled agent flags overdue items, nudges artists, escalates only true blockers to producers |
| Status reporting | Fully automatable | Agent queries tracker and generates summaries on demand or on schedule |
| Revision cycles | Partially automatable | Agent logs revisions and reassigns artists; creative review still requires human eyes |
Producer Role: Before vs After
| Today | With Automation |
|---|---|
| Create projects manually | Review auto-created projects, approve |
| Assign artists by memory | Review AI-suggested assignments, override if needed |
| Monitor every stage daily | Get alerted only on exceptions and blockers |
| Chase artists for updates | Agent handles nudges and follow-ups |
| Compile status reports | Reports generated automatically |
| Advance stages manually | Pipeline advances itself |
Architecture Decision: Chat Interface
Default: Claude API + Tool Use
- Claude interprets natural language and calls tools mapped to our services
- Best-in-class accuracy for tool calling, multi-step operations, and ambiguous requests
- ~$0.01–0.05 per interaction — negligible cost at 10-30 producers
- Estimated monthly cost: ~$20–100 depending on usage volume
Fallback: Local LLM via Ollama
- Activates automatically if the Claude API is unreachable (outage, network issues)
- Llama 3 70B or Qwen 2.5 72B running locally via existing Docker Compose stack
- Handles most straightforward operations reliably
- Ensures producers are never blocked — the chat assistant stays online regardless
Why Claude as Default
- Tool calling accuracy is significantly higher than local models — fewer misrouted commands, fewer confirmation retries
- Handles complex multi-step requests out of the box ("create 20 deliverables and assign them based on availability")
- The cost is trivial relative to the producer time saved
- Ollama remains valuable as a zero-downtime safety net, not a cost-saving measure
Implementation Steps
Phase 1: Generate the CLI
- Install CLI-Anything as a Claude Code plugin:
/plugin marketplace add HKUDS/CLI-Anything /plugin install cli-anything - Run CLI generation against the tracker codebase:
/cli-anything ~/Documents/VScode/hp_prod_tracker - Review generated commands — ensure they map correctly to existing services:
- Project CRUD (create, list, update, archive)
- Deliverable CRUD + bulk creation
- Stage advancement + status updates
- Artist assignment (with skill/capacity awareness)
- Revision logging
- Workload queries
- Excel import/export
- Refine coverage for any missing operations:
/cli-anything:refine ~/Documents/VScode/hp_prod_tracker "pipeline dependencies and bulk operations" - Run the auto-generated tests and validate against the real database
- Install the CLI locally:
cd hp_prod_tracker/agent-harness && pip install -e .
Phase 2: Build the Chat UI Component
- Create a slide-out chat panel using shadcn/ui (
Sheet+ScrollArea+Input) - Add a chat icon/button to the app sidebar or top bar
- Store chat history per user (new Prisma model or simple local state for V1)
- Pass current context (active project, deliverable) into the chat so producers don't have to specify everything
Phase 3: Wire the AI Backend (Claude API — Default)
- Install Anthropic SDK:
npm install @anthropic-ai/sdk - Create
/api/chatroute in the Next.js app - Configure Claude API client with API key (stored in environment variables)
- Define tools from existing Zod validators and service functions:
create_project,list_projects,update_projectcreate_deliverable,list_deliverablesassign_artist,remove_assignmentadvance_stage,get_blocked_stagesget_workload,get_available_artistscreate_revision,list_overdueexport_excel,import_excel
- Implement tool execution handlers that call the existing service layer
- Add confirmation flow: for any mutation, show the user what will happen before executing
- After execution, invalidate relevant TanStack Query caches so the UI updates in real-time
- Test against common producer requests:
- "Create a new project for Pavilion 16, high priority, Q3"
- "Assign Maria to Model Prep on Spectre x360"
- "What's overdue this week?"
- "Mark all Catalog Images for HP-2026-Q2 as delivered"
Phase 4: Ollama Fallback Layer
- Add a chat-capable Ollama model to
docker-compose.yml(e.g.,llama3:70borqwen2.5:72b) - Create a provider abstraction in the chat API route:
try Claude API → on connection failure → fall back to Ollama - Map the same tool definitions to Ollama's function calling format
- Add a health check endpoint that monitors Claude API availability
- Log all fallback events so we can track how often Ollama is needed
- Ensure producers see a subtle indicator when running in fallback mode (e.g., "Running locally — some complex requests may need to be simplified")
Phase 5: Automation Agents (Scheduled)
- Create scheduled agent scripts (cron or Next.js API routes triggered by cron):
- Deadline monitor: Runs daily, flags overdue stages, sends notifications
- Auto-assignment: When a stage unblocks, suggests or auto-assigns based on skills + capacity
- Stage auto-advance: When all prerequisites are approved, automatically transition downstream stages
- Status digest: Weekly summary per project emailed or posted to Slack
- Each agent uses the CLI or service layer directly
- Add producer override controls — ability to pause/resume automation per project
Phase 6: Full Pipeline Automation
- Intake automation: Monitor email inbox or Workfront API for new requests → auto-create projects
- Smart assignment: Factor in historical performance, current workload trends, and skill match scores
- Predictive alerts: Flag projects likely to miss deadlines before they're actually late
- Self-healing pipeline: If an artist hasn't started an assigned stage in X days, auto-reassign
Risk Considerations
| Risk | Mitigation |
|---|---|
| Claude API outage | Automatic fallback to local Ollama model; producers are never blocked |
| Claude API costs spike unexpectedly | Monitor usage via Anthropic dashboard; set billing alerts; ~$20-100/mo expected for 10-30 users |
| Ollama fallback misinterprets a command | Confirmation step before all mutations; undo capability; subtle UI indicator when in fallback mode |
| Producers don't trust the AI | Start with read-only queries (status checks, reports), add mutations gradually |
| Wrong artist assigned automatically | Always surface assignments as suggestions first; let producers approve for first 2-4 weeks |
| Over-automation removes producer oversight | Keep producers in the loop via notifications; require approval for high-impact actions (project creation, bulk operations) |
Success Metrics
- Reduction in time producers spend on manual tracker operations (target: 70%+)
- Accuracy of AI-driven assignments vs producer overrides (target: 85%+ acceptance rate)
- Producer adoption of chat interface (target: daily use within 4 weeks)
- Reduction in overdue stages (target: 30%+ improvement from proactive monitoring)
References
- CLI-Anything: https://github.com/HKUDS/CLI-Anything
- Claude API Tool Use: https://docs.anthropic.com/en/docs/build-with-claude/tool-use
- Project codebase: ~/Documents/VScode/hp_prod_tracker
- Existing implementation plan: ~/Documents/VScode/hp_prod_tracker/IMPLEMENTATION_PLAN.md
- Upgrade roadmap: ~/Documents/VScode/hp_prod_tracker/UPGRADE_PLAN.md