- Move 12+ outdated documentation files to docs-archive/ - Keep main directory clean with only essential files - Add archive README explaining the move - Main README.md is now the single source of truth for installation - Focus on Docker deployment as primary method 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
17 KiB
OpenAI Responses API Migration Plan - 2025 Transition Strategy
Executive Summary
Following OpenAI's deprecation timeline (Assistants API sunset: mid-2026), we're migrating from the current Make.com workflow using Assistants API to a local backend using the new Responses API. This plan ensures feature parity while future-proofing the system.
Migration Timeline & Context
Current Status (2025):
- ✅ Responses API released (March 2025) with full tool support
- ⚠️ Assistants API v1 deprecated (December 2024)
- ⏰ Assistants API complete sunset: Mid-2026
- 🎯 Migration Priority: HIGH - 18 months to complete transition
Current Assistants API Usage Analysis
From Make.com Workflow Blueprint:
1. Thread Management (Modules 203, 493)
// Current: OpenAI Threads API
POST https://api.openai.com/v1/threads
{
"messages": [{
"role": "user",
"content": "Please use this tone of voice: [TOV_CONTENT]"
}]
}
// Thread persistence via thread_id in conversations table
thread_id: "thread_xxx"
2. Assistant Message Processing (Modules 519, 520)
// Current: Assistants API messageAdvanced
{
"assistantId": "asst_xxx",
"threadId": "thread_xxx",
"role": "user",
"message": "User input"
}
// Run management with polling for completion
3. Assistant Configuration (Datastore 1607)
{
"Assistant ID": "asst_xxx", // OpenAI Assistant ID
"Name": "Creative Assistant", // Display name
"Instructions": "System prompt...", // Assistant personality
"Model": "gpt-4-turbo", // Model configuration
"Initial Message": "Hello! I'm..." // Welcome message
}
Responses API Migration Strategy
1. Conversation State Management
From: Thread-based persistence To: Response-based continuation with server-side memory
// NEW: Responses API with conversation memory
const response = await client.responses.create({
model: "gpt-4o",
input: userMessage,
store: true, // Enable server-side memory
previous_response_id: lastResponseId, // Continue conversation
system: assistantInstructions, // Assistant personality
temperature: 0.7
});
// Store response_id for conversation continuation
conversation.last_response_id = response.id;
Key Benefits:
- ✅ Automatic conversation memory management
- ✅ No manual thread/run management
- ✅ Simplified API calls (single endpoint)
- ✅ Built-in conversation forking capability
2. Assistant Personality System
From: Pre-configured Assistant IDs To: Dynamic system prompts with response configuration
// NEW: Dynamic assistant configuration
const assistants = {
"creative_ideation": {
name: "Creative Ideation Assistant",
system: `You are a highly creative business ideation assistant with decades of experience helping teams generate innovative solutions. Your responses should be:
- Imaginative and forward-thinking
- Practical and implementable
- Encouraging and enthusiastic
- Rich with diverse perspectives and examples`,
model: "gpt-4o",
temperature: 0.8,
initial_message: "Hello! I'm here to spark your creativity and help generate amazing business ideas!"
},
"analytical_advisor": {
name: "Analytical Business Advisor",
system: `You are a data-driven business analyst and strategic advisor. Your responses should be:
- Methodical and evidence-based
- Structured with clear frameworks
- Risk-aware and practical
- Focused on measurable outcomes`,
model: "gpt-4o",
temperature: 0.3,
initial_message: "Greetings! I'm ready to provide analytical insights and strategic guidance for your business challenges."
}
};
3. Tone-of-Voice Integration
From: Thread-level TOV injection To: Dynamic system prompt modification
// NEW: Enhanced system prompt with TOV
function buildSystemPrompt(assistantKey, tovKey) {
const basePrompt = assistants[assistantKey].system;
const tovPrompts = {
"standard": "",
"pep": "\n\nAdditionally, use an energetic, enthusiastic, and motivational tone in all your responses. Be upbeat, use exclamation points appropriately, and inspire action.",
"professional": "\n\nMaintain a formal, professional tone throughout. Use clear, concise language appropriate for executive-level communication.",
"casual": "\n\nUse a friendly, conversational tone. Be approachable and relatable while maintaining helpfulness."
};
return basePrompt + (tovPrompts[tovKey] || "");
}
// Usage in API call
const systemPrompt = buildSystemPrompt(assistantKey, tovKey);
const response = await client.responses.create({
model: assistants[assistantKey].model,
input: userMessage,
system: systemPrompt,
store: true,
previous_response_id: lastResponseId
});
4. Content Processing Pipeline
From: External markdown compilation To: Built-in response processing with enhanced tools
// NEW: Simplified response handling
const response = await client.responses.create({
model: "gpt-4o",
input: userMessage,
system: systemPrompt,
store: true,
previous_response_id: lastResponseId,
// Enhanced with built-in tools
tools: [
{ type: "web_search" }, // Built-in web search
{ type: "file_search" }, // Built-in file search
]
});
// Response includes formatted content
const assistantMessage = response.choices[0].message.content;
// Built-in markdown support, no external processing needed
Updated Database Schema
Modified Tables for Responses API:
conversations table (updated):
CREATE TABLE conversations (
id TEXT PRIMARY KEY,
user_id TEXT NOT NULL,
title TEXT,
last_response_id TEXT, -- NEW: Instead of thread_id
assistant_key TEXT,
tov_key TEXT DEFAULT 'standard',
model TEXT DEFAULT 'gpt-4o', -- NEW: Per-conversation model tracking
cost DECIMAL(10,4) DEFAULT 0.0000,
start_time DATETIME DEFAULT CURRENT_TIMESTAMP,
end_time DATETIME DEFAULT CURRENT_TIMESTAMP,
-- Remove thread_id, assistant_id columns
-- Remove assistant_id foreign key constraint
);
assistants table (simplified):
CREATE TABLE assistants (
id INTEGER PRIMARY KEY AUTOINCREMENT,
key TEXT UNIQUE NOT NULL,
name TEXT NOT NULL,
system_prompt TEXT NOT NULL, -- NEW: Full system prompt
model TEXT DEFAULT 'gpt-4o',
temperature DECIMAL(3,2) DEFAULT 0.7, -- NEW: Model parameters
initial_message TEXT,
deleted BOOLEAN DEFAULT FALSE,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
-- Remove assistant_id column (no more OpenAI Assistant IDs)
-- Remove instructions column (merged into system_prompt)
);
responses table (new):
CREATE TABLE responses (
id TEXT PRIMARY KEY, -- OpenAI response_id
conversation_id TEXT NOT NULL,
parent_response_id TEXT, -- For conversation threading
model TEXT NOT NULL,
system_prompt TEXT, -- Snapshot of system prompt used
input_tokens INTEGER DEFAULT 0,
output_tokens INTEGER DEFAULT 0,
cost DECIMAL(10,6) DEFAULT 0.000000,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (conversation_id) REFERENCES conversations (id)
);
API Implementation Changes
1. Updated Chat Endpoint (routes/chat.js):
const express = require('express');
const router = express.Router();
const { OpenAI } = require('openai');
const { v4: uuidv4 } = require('uuid');
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
router.post('/', async (req, res) => {
try {
const { user_id } = req.auth;
const { ConversationID, AssistantKey, TOV_Key, Message } = req.body;
// Validate required fields
if (!AssistantKey || !TOV_Key || !Message) {
return res.status(400).json({ error: 'Missing required fields' });
}
// Content moderation (still separate API)
const moderation = await openai.moderations.create({ input: Message });
if (moderation.results[0].flagged) {
return res.status(400).json({ error: 'Content flagged by moderation' });
}
// Get assistant configuration
const assistant = await Assistant.findOne({
where: { key: AssistantKey, deleted: false }
});
if (!assistant) {
return res.status(400).json({ error: 'Error: Assistant Not Set' });
}
let conversation;
let isNewConversation = !ConversationID;
let previousResponseId = null;
if (isNewConversation) {
// Create new conversation
conversation = await Conversation.create({
id: uuidv4(),
user_id,
assistant_key: AssistantKey,
tov_key: TOV_Key,
model: assistant.model
});
} else {
// Get existing conversation
conversation = await Conversation.findOne({
where: { id: ConversationID, user_id }
});
if (!conversation) {
return res.status(404).json({ error: 'Conversation not found' });
}
previousResponseId = conversation.last_response_id;
}
// Build system prompt with TOV
const systemPrompt = buildSystemPrompt(AssistantKey, TOV_Key);
// Call Responses API
const response = await openai.responses.create({
model: assistant.model,
input: Message,
system: systemPrompt,
temperature: assistant.temperature,
store: true, // Enable conversation memory
previous_response_id: previousResponseId,
// Built-in tools (if needed)
tools: [
{ type: "web_search" },
{ type: "file_search" }
]
});
// Store user message
await Message.create({
conversation_id: conversation.id,
role: 'user',
content: Message,
content_plain: Message
});
// Extract assistant response
const assistantMessage = response.choices[0].message.content;
// Store assistant message
await Message.create({
conversation_id: conversation.id,
role: 'assistant',
content: assistantMessage,
content_plain: assistantMessage
});
// Store response metadata
await Response.create({
id: response.id,
conversation_id: conversation.id,
parent_response_id: previousResponseId,
model: assistant.model,
system_prompt: systemPrompt,
input_tokens: response.usage.prompt_tokens,
output_tokens: response.usage.completion_tokens,
cost: calculateCost(response.usage, assistant.model)
});
// Update conversation
await conversation.update({
last_response_id: response.id,
end_time: new Date()
});
// Generate title for new conversations
if (isNewConversation) {
const title = await generateTitle(Message);
await conversation.update({ title });
return res.json({
conversation_id: conversation.id,
conversation_title: title,
message: assistantMessage
});
}
res.json({
conversation_id: conversation.id,
message: assistantMessage
});
} catch (error) {
console.error('Chat error:', error);
res.status(500).json({ error: 'Internal server error' });
}
});
module.exports = router;
2. Conversation Retrieval (routes/conversations.js):
// GET /api/conversations/:id/messages
router.get('/:id/messages', async (req, res) => {
try {
const { user_id } = req.auth;
const { id } = req.params;
// Option 1: Retrieve from local database (maintains current UX)
const messages = await Message.findAll({
where: { conversation_id: id },
order: [['timestamp', 'ASC']]
});
// Option 2: Retrieve full conversation from OpenAI (leveraging server-side memory)
const conversation = await Conversation.findOne({
where: { id, user_id }
});
if (!conversation || !conversation.last_response_id) {
return res.json({ conversation_id: id, messages: [] });
}
// Fetch complete conversation from OpenAI
const openaiResponse = await openai.responses.retrieve(
conversation.last_response_id
);
// openaiResponse includes full conversation history
const fullConversation = openaiResponse.messages || [];
res.json({
conversation_id: id,
messages: fullConversation.map(msg => ({
role: msg.role,
content: msg.content
}))
});
} catch (error) {
console.error('Messages retrieval error:', error);
res.status(500).json({ error: 'Failed to retrieve messages' });
}
});
3. Enhanced Features with Responses API:
Conversation Forking:
// Fork conversation at any point
router.post('/:id/fork', async (req, res) => {
const { response_id, new_message } = req.body;
const forkedResponse = await openai.responses.create({
model: "gpt-4o",
input: new_message,
previous_response_id: response_id, // Fork from this point
store: true
});
// Create new conversation branch
const newConversation = await Conversation.create({
id: uuidv4(),
user_id,
last_response_id: forkedResponse.id,
// ... other fields
});
res.json({ conversation_id: newConversation.id });
});
Built-in Web Search:
// Automatic web search when relevant
const response = await openai.responses.create({
model: "gpt-4o",
input: "What are the latest trends in AI for 2025?",
tools: [{ type: "web_search" }], // Automatically searches web when needed
store: true
});
Migration Benefits
1. Simplified Architecture
- ❌ Remove: Thread management, run polling, message creation
- ✅ Add: Single API call with automatic memory
- 📉 Reduce: ~60% fewer API calls per conversation
2. Enhanced Capabilities
- 🌐 Built-in Web Search: No external integration needed
- 📁 Built-in File Search: Advanced RAG capabilities
- 🔧 Enhanced Tools: Future-proof tool ecosystem
- 🧠 Server-side Memory: Automatic conversation management
3. Cost Optimization
- 💰 Reduced API calls: Single endpoint vs multiple (threads, messages, runs)
- ⚡ Faster responses: No run polling delays
- 📊 Better analytics: Built-in usage tracking
4. Developer Experience
- 🚀 Simpler debugging: Single API call to trace
- 🔄 Easier testing: Stateless requests for unit testing
- 📚 Better documentation: Active OpenAI support and examples
Implementation Timeline
Phase 1: Foundation (Week 1)
- Set up Responses API client and authentication
- Update database schema for response_id tracking
- Create assistant configuration system
- Test basic Responses API integration
Phase 2: Core Migration (Week 2)
- Implement new chat endpoint with Responses API
- Update conversation retrieval logic
- Migrate tone-of-voice system to dynamic prompts
- Test conversation continuity and memory
Phase 3: Enhanced Features (Week 3)
- Integrate built-in web search capabilities
- Add conversation forking functionality
- Implement advanced analytics and cost tracking
- Update frontend for new response format
Phase 4: Production Optimization (Week 4)
- Performance testing and optimization
- Error handling and retry logic
- Monitoring and alerting setup
- Documentation and deployment guides
Phase 5: Parallel Operation (Week 5)
- Run both systems in parallel for validation
- Data migration from Assistants to Responses format
- User acceptance testing
- Gradual cutover strategy
Risk Mitigation
1. API Compatibility
- Risk: Breaking changes in Responses API
- Mitigation: Version pinning, fallback to Chat Completions API
2. Feature Gaps
- Risk: Missing features from Assistants API
- Mitigation: Hybrid approach using Chat Completions for gaps
3. Migration Timeline
- Risk: Assistants API sunset before migration complete
- Mitigation: Aggressive timeline with parallel development
4. Data Loss
- Risk: Conversation history lost during migration
- Mitigation: Full data export and mapping strategy
Success Metrics
Technical Metrics:
- ✅ Response Time: <2s average (vs current ~5s with polling)
- ✅ API Call Reduction: 60% fewer calls per conversation
- ✅ Error Rate: <1% API errors
- ✅ Feature Parity: 100% current functionality maintained
Business Metrics:
- 💰 Cost Reduction: 30-40% OpenAI usage costs
- 📈 User Satisfaction: Improved response times
- 🛠 Developer Velocity: Faster feature development
- 🔮 Future-Proofing: Ready for OpenAI's 2026+ roadmap
This migration plan ensures we transition to the Responses API while maintaining all current functionality and positioning for enhanced capabilities and cost optimization.