DJP b909d7e19a Clean up repository structure and archive legacy docs

- Move 12+ outdated documentation files to docs-archive/
- Keep main directory clean with only essential files
- Add archive README explaining the move
- Main README.md is now the single source of truth for installation
- Focus on Docker deployment as primary method

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-09-10 16:24:39 -04:00

17 KiB

Raw Permalink Blame History

OpenAI Responses API Migration Plan - 2025 Transition Strategy

Executive Summary

Following OpenAI's deprecation timeline (Assistants API sunset: mid-2026), we're migrating from the current Make.com workflow using Assistants API to a local backend using the new Responses API. This plan ensures feature parity while future-proofing the system.

Migration Timeline & Context

Current Status (2025):

✅ Responses API released (March 2025) with full tool support
⚠️ Assistants API v1 deprecated (December 2024)
⏰ Assistants API complete sunset: Mid-2026
🎯 Migration Priority: HIGH - 18 months to complete transition

Current Assistants API Usage Analysis

From Make.com Workflow Blueprint:

1. Thread Management (Modules 203, 493)

// Current: OpenAI Threads API
POST https://api.openai.com/v1/threads
{
  "messages": [{
    "role": "user",
    "content": "Please use this tone of voice: [TOV_CONTENT]"
  }]
}

// Thread persistence via thread_id in conversations table
thread_id: "thread_xxx"

2. Assistant Message Processing (Modules 519, 520)

// Current: Assistants API messageAdvanced
{
  "assistantId": "asst_xxx",
  "threadId": "thread_xxx", 
  "role": "user",
  "message": "User input"
}

// Run management with polling for completion

3. Assistant Configuration (Datastore 1607)

{
  "Assistant ID": "asst_xxx",           // OpenAI Assistant ID
  "Name": "Creative Assistant",         // Display name
  "Instructions": "System prompt...",   // Assistant personality
  "Model": "gpt-4-turbo",              // Model configuration
  "Initial Message": "Hello! I'm..."    // Welcome message
}

Responses API Migration Strategy

1. Conversation State Management

From: Thread-based persistence To: Response-based continuation with server-side memory

// NEW: Responses API with conversation memory
const response = await client.responses.create({
  model: "gpt-4o",
  input: userMessage,
  store: true,                          // Enable server-side memory
  previous_response_id: lastResponseId, // Continue conversation
  system: assistantInstructions,        // Assistant personality
  temperature: 0.7
});

// Store response_id for conversation continuation
conversation.last_response_id = response.id;

Key Benefits:

✅ Automatic conversation memory management
✅ No manual thread/run management
✅ Simplified API calls (single endpoint)
✅ Built-in conversation forking capability

2. Assistant Personality System

From: Pre-configured Assistant IDs To: Dynamic system prompts with response configuration

// NEW: Dynamic assistant configuration
const assistants = {
  "creative_ideation": {
    name: "Creative Ideation Assistant",
    system: `You are a highly creative business ideation assistant with decades of experience helping teams generate innovative solutions. Your responses should be:
    - Imaginative and forward-thinking
    - Practical and implementable
    - Encouraging and enthusiastic
    - Rich with diverse perspectives and examples`,
    model: "gpt-4o",
    temperature: 0.8,
    initial_message: "Hello! I'm here to spark your creativity and help generate amazing business ideas!"
  },
  
  "analytical_advisor": {
    name: "Analytical Business Advisor",
    system: `You are a data-driven business analyst and strategic advisor. Your responses should be:
    - Methodical and evidence-based
    - Structured with clear frameworks
    - Risk-aware and practical
    - Focused on measurable outcomes`,
    model: "gpt-4o", 
    temperature: 0.3,
    initial_message: "Greetings! I'm ready to provide analytical insights and strategic guidance for your business challenges."
  }
};

3. Tone-of-Voice Integration

From: Thread-level TOV injection To: Dynamic system prompt modification

// NEW: Enhanced system prompt with TOV
function buildSystemPrompt(assistantKey, tovKey) {
  const basePrompt = assistants[assistantKey].system;
  const tovPrompts = {
    "standard": "",
    "pep": "\n\nAdditionally, use an energetic, enthusiastic, and motivational tone in all your responses. Be upbeat, use exclamation points appropriately, and inspire action.",
    "professional": "\n\nMaintain a formal, professional tone throughout. Use clear, concise language appropriate for executive-level communication.",
    "casual": "\n\nUse a friendly, conversational tone. Be approachable and relatable while maintaining helpfulness."
  };
  
  return basePrompt + (tovPrompts[tovKey] || "");
}

// Usage in API call
const systemPrompt = buildSystemPrompt(assistantKey, tovKey);
const response = await client.responses.create({
  model: assistants[assistantKey].model,
  input: userMessage,
  system: systemPrompt,
  store: true,
  previous_response_id: lastResponseId
});

4. Content Processing Pipeline

From: External markdown compilation To: Built-in response processing with enhanced tools

// NEW: Simplified response handling
const response = await client.responses.create({
  model: "gpt-4o",
  input: userMessage,
  system: systemPrompt,
  store: true,
  previous_response_id: lastResponseId,
  
  // Enhanced with built-in tools
  tools: [
    { type: "web_search" },    // Built-in web search
    { type: "file_search" },   // Built-in file search
  ]
});

// Response includes formatted content
const assistantMessage = response.choices[0].message.content;
// Built-in markdown support, no external processing needed

Updated Database Schema

Modified Tables for Responses API:

conversations table (updated):

CREATE TABLE conversations (
    id TEXT PRIMARY KEY,
    user_id TEXT NOT NULL,
    title TEXT,
    last_response_id TEXT,              -- NEW: Instead of thread_id
    assistant_key TEXT,
    tov_key TEXT DEFAULT 'standard',
    model TEXT DEFAULT 'gpt-4o',        -- NEW: Per-conversation model tracking
    cost DECIMAL(10,4) DEFAULT 0.0000,
    start_time DATETIME DEFAULT CURRENT_TIMESTAMP,
    end_time DATETIME DEFAULT CURRENT_TIMESTAMP,
    
    -- Remove thread_id, assistant_id columns
    -- Remove assistant_id foreign key constraint
);

assistants table (simplified):

CREATE TABLE assistants (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    key TEXT UNIQUE NOT NULL,
    name TEXT NOT NULL,
    system_prompt TEXT NOT NULL,         -- NEW: Full system prompt
    model TEXT DEFAULT 'gpt-4o',
    temperature DECIMAL(3,2) DEFAULT 0.7, -- NEW: Model parameters
    initial_message TEXT,
    deleted BOOLEAN DEFAULT FALSE,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    
    -- Remove assistant_id column (no more OpenAI Assistant IDs)
    -- Remove instructions column (merged into system_prompt)
);

responses table (new):

CREATE TABLE responses (
    id TEXT PRIMARY KEY,                 -- OpenAI response_id
    conversation_id TEXT NOT NULL,
    parent_response_id TEXT,             -- For conversation threading
    model TEXT NOT NULL,
    system_prompt TEXT,                  -- Snapshot of system prompt used
    input_tokens INTEGER DEFAULT 0,
    output_tokens INTEGER DEFAULT 0,
    cost DECIMAL(10,6) DEFAULT 0.000000,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (conversation_id) REFERENCES conversations (id)
);

API Implementation Changes

1. Updated Chat Endpoint (`routes/chat.js`):

const express = require('express');
const router = express.Router();
const { OpenAI } = require('openai');
const { v4: uuidv4 } = require('uuid');

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

router.post('/', async (req, res) => {
  try {
    const { user_id } = req.auth;
    const { ConversationID, AssistantKey, TOV_Key, Message } = req.body;
    
    // Validate required fields
    if (!AssistantKey || !TOV_Key || !Message) {
      return res.status(400).json({ error: 'Missing required fields' });
    }
    
    // Content moderation (still separate API)
    const moderation = await openai.moderations.create({ input: Message });
    if (moderation.results[0].flagged) {
      return res.status(400).json({ error: 'Content flagged by moderation' });
    }
    
    // Get assistant configuration
    const assistant = await Assistant.findOne({ 
      where: { key: AssistantKey, deleted: false }
    });
    
    if (!assistant) {
      return res.status(400).json({ error: 'Error: Assistant Not Set' });
    }
    
    let conversation;
    let isNewConversation = !ConversationID;
    let previousResponseId = null;
    
    if (isNewConversation) {
      // Create new conversation
      conversation = await Conversation.create({
        id: uuidv4(),
        user_id,
        assistant_key: AssistantKey,
        tov_key: TOV_Key,
        model: assistant.model
      });
    } else {
      // Get existing conversation
      conversation = await Conversation.findOne({
        where: { id: ConversationID, user_id }
      });
      
      if (!conversation) {
        return res.status(404).json({ error: 'Conversation not found' });
      }
      
      previousResponseId = conversation.last_response_id;
    }
    
    // Build system prompt with TOV
    const systemPrompt = buildSystemPrompt(AssistantKey, TOV_Key);
    
    // Call Responses API
    const response = await openai.responses.create({
      model: assistant.model,
      input: Message,
      system: systemPrompt,
      temperature: assistant.temperature,
      store: true,                        // Enable conversation memory
      previous_response_id: previousResponseId,
      
      // Built-in tools (if needed)
      tools: [
        { type: "web_search" },
        { type: "file_search" }
      ]
    });
    
    // Store user message
    await Message.create({
      conversation_id: conversation.id,
      role: 'user',
      content: Message,
      content_plain: Message
    });
    
    // Extract assistant response
    const assistantMessage = response.choices[0].message.content;
    
    // Store assistant message
    await Message.create({
      conversation_id: conversation.id,
      role: 'assistant',
      content: assistantMessage,
      content_plain: assistantMessage
    });
    
    // Store response metadata
    await Response.create({
      id: response.id,
      conversation_id: conversation.id,
      parent_response_id: previousResponseId,
      model: assistant.model,
      system_prompt: systemPrompt,
      input_tokens: response.usage.prompt_tokens,
      output_tokens: response.usage.completion_tokens,
      cost: calculateCost(response.usage, assistant.model)
    });
    
    // Update conversation
    await conversation.update({
      last_response_id: response.id,
      end_time: new Date()
    });
    
    // Generate title for new conversations
    if (isNewConversation) {
      const title = await generateTitle(Message);
      await conversation.update({ title });
      
      return res.json({
        conversation_id: conversation.id,
        conversation_title: title,
        message: assistantMessage
      });
    }
    
    res.json({
      conversation_id: conversation.id,
      message: assistantMessage
    });
    
  } catch (error) {
    console.error('Chat error:', error);
    res.status(500).json({ error: 'Internal server error' });
  }
});

module.exports = router;

2. Conversation Retrieval (`routes/conversations.js`):

// GET /api/conversations/:id/messages
router.get('/:id/messages', async (req, res) => {
  try {
    const { user_id } = req.auth;
    const { id } = req.params;
    
    // Option 1: Retrieve from local database (maintains current UX)
    const messages = await Message.findAll({
      where: { conversation_id: id },
      order: [['timestamp', 'ASC']]
    });
    
    // Option 2: Retrieve full conversation from OpenAI (leveraging server-side memory)
    const conversation = await Conversation.findOne({
      where: { id, user_id }
    });
    
    if (!conversation || !conversation.last_response_id) {
      return res.json({ conversation_id: id, messages: [] });
    }
    
    // Fetch complete conversation from OpenAI
    const openaiResponse = await openai.responses.retrieve(
      conversation.last_response_id
    );
    
    // openaiResponse includes full conversation history
    const fullConversation = openaiResponse.messages || [];
    
    res.json({
      conversation_id: id,
      messages: fullConversation.map(msg => ({
        role: msg.role,
        content: msg.content
      }))
    });
    
  } catch (error) {
    console.error('Messages retrieval error:', error);
    res.status(500).json({ error: 'Failed to retrieve messages' });
  }
});

3. Enhanced Features with Responses API:

Conversation Forking:

// Fork conversation at any point
router.post('/:id/fork', async (req, res) => {
  const { response_id, new_message } = req.body;
  
  const forkedResponse = await openai.responses.create({
    model: "gpt-4o",
    input: new_message,
    previous_response_id: response_id, // Fork from this point
    store: true
  });
  
  // Create new conversation branch
  const newConversation = await Conversation.create({
    id: uuidv4(),
    user_id,
    last_response_id: forkedResponse.id,
    // ... other fields
  });
  
  res.json({ conversation_id: newConversation.id });
});

Built-in Web Search:

// Automatic web search when relevant
const response = await openai.responses.create({
  model: "gpt-4o",
  input: "What are the latest trends in AI for 2025?",
  tools: [{ type: "web_search" }], // Automatically searches web when needed
  store: true
});

Migration Benefits

1. Simplified Architecture

❌ Remove: Thread management, run polling, message creation
✅ Add: Single API call with automatic memory
📉 Reduce: ~60% fewer API calls per conversation

2. Enhanced Capabilities

🌐 Built-in Web Search: No external integration needed
📁 Built-in File Search: Advanced RAG capabilities
🔧 Enhanced Tools: Future-proof tool ecosystem
🧠 Server-side Memory: Automatic conversation management

3. Cost Optimization

💰 Reduced API calls: Single endpoint vs multiple (threads, messages, runs)
⚡ Faster responses: No run polling delays
📊 Better analytics: Built-in usage tracking

4. Developer Experience

🚀 Simpler debugging: Single API call to trace
🔄 Easier testing: Stateless requests for unit testing
📚 Better documentation: Active OpenAI support and examples

Implementation Timeline

Phase 1: Foundation (Week 1)

Set up Responses API client and authentication
Update database schema for response_id tracking
Create assistant configuration system
Test basic Responses API integration

Phase 2: Core Migration (Week 2)

Implement new chat endpoint with Responses API
Update conversation retrieval logic
Migrate tone-of-voice system to dynamic prompts
Test conversation continuity and memory

Phase 3: Enhanced Features (Week 3)

Integrate built-in web search capabilities
Add conversation forking functionality
Implement advanced analytics and cost tracking
Update frontend for new response format

Phase 4: Production Optimization (Week 4)

Performance testing and optimization
Error handling and retry logic
Monitoring and alerting setup
Documentation and deployment guides

Phase 5: Parallel Operation (Week 5)

Run both systems in parallel for validation
Data migration from Assistants to Responses format
User acceptance testing
Gradual cutover strategy

Risk Mitigation

1. API Compatibility

Risk: Breaking changes in Responses API
Mitigation: Version pinning, fallback to Chat Completions API

2. Feature Gaps

Risk: Missing features from Assistants API
Mitigation: Hybrid approach using Chat Completions for gaps

3. Migration Timeline

Risk: Assistants API sunset before migration complete
Mitigation: Aggressive timeline with parallel development

4. Data Loss

Risk: Conversation history lost during migration
Mitigation: Full data export and mapping strategy

Success Metrics

Technical Metrics:

✅ Response Time: <2s average (vs current ~5s with polling)
✅ API Call Reduction: 60% fewer calls per conversation
✅ Error Rate: <1% API errors
✅ Feature Parity: 100% current functionality maintained

Business Metrics:

💰 Cost Reduction: 30-40% OpenAI usage costs
📈 User Satisfaction: Improved response times
🛠 Developer Velocity: Faster feature development
🔮 Future-Proofing: Ready for OpenAI's 2026+ roadmap

This migration plan ensures we transition to the Responses API while maintaining all current functionality and positioning for enhanced capabilities and cost optimization.

17 KiB Raw Permalink Blame History