semblance-dev/semblance_app_documentation.md

# Semblance Synthetic Society - Application Documentation

## Table of Contents
1. [Application Overview](#application-overview)
2. [Application Architecture](#application-architecture)
3. [User Activity Flow](#user-activity-flow)
4. [Data Model](#data-model)
5. [Technical Details and Specifications](#technical-details-and-specifications)
6. [API Structure](#api-structure)
7. [AI Integration](#ai-integration)
8. [Development and Deployment](#development-and-deployment)
9. [Security Considerations](#security-considerations)
10. [Key Features](#key-features)

## Application Overview

Semblance Synthetic Society is an AI-powered platform for creating and managing synthetic personas for focus groups and market research. It enables researchers to:

- Create detailed synthetic personas with demographic profiles and personality traits
- Organize personas into focus groups
- Run AI-moderated focus group sessions with autonomous conversations
- Analyze results with real-time theme extraction and reporting
- Export comprehensive insights and recommendations

## Application Architecture

### High-Level Architecture

```mermaid
graph TB
    subgraph "Frontend - React/TypeScript"
        A[React App<br/>Vite + TypeScript] --> B[Components]
        B --> C[Pages]
        B --> D[UI Components<br/>shadcn-ui]
        A --> E[State Management<br/>React Context + Hooks]
        A --> F[API Client<br/>Axios]
    end

    subgraph "Backend - Python/Flask"
        G[Flask API] --> H[Routes]
        H --> I[Services]
        I --> J[Models]
        J --> K[(MongoDB)]
        I --> L[Google Gemini AI<br/>LLM Integration]
    end

    F -->|HTTP/REST| G

    subgraph "Infrastructure"
        M[Static Hosting<br/>Frontend Build]
        N[WSGI Server<br/>Backend API]
        O[MongoDB Instance]
    end
```

### Technology Stack

#### Frontend
- **Framework**: React 18.3.1 with TypeScript
- **Build Tool**: Vite 5.4.1
- **Styling**: Tailwind CSS 3.4.11 with shadcn-ui components
- **Routing**: React Router DOM 6.26.2
- **State Management**: React Context API + Custom Hooks
- **HTTP Client**: Axios 1.6.2
- **Forms**: React Hook Form 7.53.0 with Zod validation
- **Authentication**: JWT with localStorage persistence

#### Backend
- **Framework**: Flask (Python)
- **Database**: MongoDB with PyMongo
- **Authentication**: Flask-JWT-Extended
- **AI Integration**: Google Generative AI (Gemini 2.5 Pro)
- **Server**: Hypercorn (ASGI server)
- **CORS**: Flask-CORS for cross-origin requests

## User Activity Flow

### Main User Journey

```mermaid
flowchart TD
    A[Landing Page] --> B{Authenticated?}
    B -->|No| C[Login Page]
    B -->|Yes| D[Dashboard]
    C --> E[Enter Credentials]
    E --> F[JWT Authentication]
    F --> D

    D --> G[Synthetic Users]
    D --> H[Focus Groups]
    D --> I[Overview Stats]

    G --> J[Create Persona]
    G --> K[AI Recruiter]
    G --> L[View/Edit Personas]

    J --> M[Manual Creation Form]
    K --> N[Bulk Generation]

    H --> O[Create Focus Group]
    H --> P[View Groups]

    O --> Q[Select Participants]
    O --> R[Configure Settings]

    P --> S[Focus Group Session]
    S --> T[AI Moderated Discussion]
    S --> U[Real-time Analytics]
    S --> V[Export Results]
```

### Focus Group Session Flow

```mermaid
sequenceDiagram
    participant U as User
    participant F as Frontend
    participant B as Backend
    participant AI as AI Service
    participant DB as MongoDB

    U->>F: Start Focus Group Session
    F->>B: Initialize Session
    B->>DB: Load Focus Group Data
    B->>DB: Load Participant Personas
    B-->>F: Return Session Data

    U->>F: Ask Question
    F->>B: Send Moderator Message
    B->>AI: Generate AI Response
    AI-->>B: AI Moderator Response
    B->>DB: Store Message
    B-->>F: Return Response

    loop Autonomous Mode
        B->>AI: Determine Next Action
        AI-->>B: Next Participant/Question
        B->>AI: Generate Participant Response
        AI-->>B: Persona Response
        B->>DB: Store Message
        B-->>F: Stream Response
    end

    U->>F: Request Themes
    F->>B: Extract Key Themes
    B->>AI: Analyze Conversation
    AI-->>B: Extracted Themes
    B->>DB: Store Themes
    B-->>F: Return Themes
```

## Data Model

### MongoDB Collections

```mermaid
erDiagram
    USERS ||--o{ PERSONAS : creates
    USERS ||--o{ FOCUS_GROUPS : creates
    FOCUS_GROUPS ||--o{ PERSONAS : includes
    FOCUS_GROUPS ||--o{ MESSAGES : contains
    FOCUS_GROUPS ||--o{ THEMES : generates
    FOCUS_GROUPS ||--o{ NOTES : has
    FOCUS_GROUPS ||--o{ REASONING : tracks

    USERS {
        ObjectId _id PK
        string username
        string email
        string password_hash
        string role
        datetime created_at
    }

    PERSONAS {
        ObjectId _id PK
        string name
        string age
        string gender
        string occupation
        string location
        number techSavviness
        string personality
        object oceanTraits
        object thinkFeelDo
        array goals
        array frustrations
        array motivations
        string created_by FK
        datetime created_at
    }

    FOCUS_GROUPS {
        ObjectId _id PK
        string name
        string description
        string objective
        array participants FK
        object discussionGuide
        string status
        string created_by FK
        datetime created_at
    }

    MESSAGES {
        ObjectId _id PK
        string focus_group_id FK
        string text
        string type
        string senderId
        boolean highlighted
        datetime created_at
    }

    THEMES {
        ObjectId _id PK
        string focus_group_id FK
        string title
        string description
        array quotes
        string source
        datetime created_at
    }
```

### Key TypeScript Interfaces

```typescript
// Persona Interface
interface Persona {
  id: string;
  _id?: string;
  name: string;
  age: string;
  gender: string;
  occupation: string;
  location: string;
  techSavviness: number;
  personality: string;
  oceanTraits?: {
    openness: number;
    conscientiousness: number;
    extraversion: number;
    agreeableness: number;
    neuroticism: number;
  };
  thinkFeelDo?: {
    thinks: string[];
    feels: string[];
    does: string[];
  };
  goals?: string[];
  frustrations?: string[];
  motivations?: string[];
  scenarios?: string[];
  // Additional fields...
}

// Focus Group Interface
interface FocusGroup {
  _id: string;
  name: string;
  description: string;
  objective: string;
  participants: string[];
  discussionGuide?: DiscussionGuide;
  status: string;
  created_at: string;
  created_by: string;
}

// Discussion Guide Structure
interface DiscussionGuide {
  introduction: string;
  sections: {
    id: string;
    title: string;
    duration: number;
    items: {
      id: string;
      type: 'question' | 'activity' | 'probe';
      content: string;
      notes?: string;
    }[];
  }[];
  conclusion: string;
}
```

## Technical Details and Specifications

### Authentication Flow

```mermaid
sequenceDiagram
    participant C as Client
    participant F as Frontend
    participant B as Backend
    participant DB as Database

    C->>F: Enter Credentials
    F->>B: POST /api/auth/login
    B->>DB: Verify User
    alt Valid Credentials
        B-->>F: JWT Token + User Data
        F->>F: Store in localStorage
        F-->>C: Redirect to Dashboard
    else Invalid Credentials
        B-->>F: 401 Unauthorized
        F-->>C: Show Error
    end

    Note over F: All API Requests
    F->>F: Add JWT to Headers
    F->>B: API Request with Bearer Token
    B->>B: Verify JWT
    alt Valid Token
        B-->>F: API Response
    else Invalid Token
        B-->>F: 401 Unauthorized
        F->>F: Clear localStorage
        F-->>C: Redirect to Login
    end
```

### State Management

The application uses React Context API for global state management:

1. **AuthContext**: Manages user authentication state, JWT token, and login/logout functionality
2. **Local State**: Component-level state for UI interactions
3. **API State**: React Query could be integrated for server state management (currently using direct API calls)

### API Structure

#### Authentication Endpoints
- `POST /api/auth/login` - User login
- `POST /api/auth/register` - User registration
- `GET /api/auth/me` - Get current user profile

#### Persona Management
- `GET /api/personas` - List all personas
- `POST /api/personas` - Create new persona
- `GET /api/personas/:id` - Get persona details
- `PUT /api/personas/:id` - Update persona
- `DELETE /api/personas/:id` - Delete persona

#### Focus Group Management
- `GET /api/focus-groups` - List all focus groups
- `POST /api/focus-groups` - Create new focus group
- `GET /api/focus-groups/:id` - Get focus group details
- `PUT /api/focus-groups/:id` - Update focus group
- `DELETE /api/focus-groups/:id` - Delete focus group

#### AI Operations
- `POST /api/ai-personas/generate` - Generate synthetic personas
- `POST /api/focus-group-ai/:id/start` - Start AI moderation
- `POST /api/focus-group-ai/:id/stop` - Stop AI moderation
- `POST /api/focus-group-ai/:id/message` - Send message to AI moderator
- `GET /api/focus-group-ai/:id/status` - Get moderator status
- `POST /api/focus-group-ai/:id/themes` - Extract key themes

## AI Integration

### LLM Service Architecture

```mermaid
graph TD
    A[API Routes] --> B[Service Layer]
    B --> C[LLM Service]
    C --> D[Google Gemini API]

    B --> E[Prompt Templates]
    E --> F[persona-generation.md]
    E --> G[focus-group-response.md]
    E --> H[theme-extraction.md]
    E --> I[moderator-system.md]

    C --> J[Response Processing]
    J --> K[JSON Extraction]
    J --> L[Error Handling]
    J --> M[Rate Limiting]
```

### Key AI Features

1. **Persona Generation**: Uses Gemini to create detailed, realistic personas based on demographic parameters
2. **AI Moderation**: Autonomous focus group moderation with context awareness
3. **Response Generation**: Persona-specific responses based on personality profiles
4. **Theme Extraction**: Real-time analysis of conversation to identify key themes
5. **Conversation Flow**: AI determines next speakers and follow-up questions

### Prompt Engineering

The system uses structured prompts stored in `/backend/prompts/`:
- System prompts define AI behavior and constraints
- Template prompts use variable substitution for dynamic content
- Chain-of-thought reasoning for complex decisions

## Development and Deployment

### Local Development Setup

```bash
# Frontend
npm install
npm run dev  # Development server at http://localhost:5173

# Backend
cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python run.py  # API server at http://localhost:5137

# MongoDB
# Ensure MongoDB is running on localhost:27017
```

### Build and Deployment

```bash
# Frontend Build
npm run build  # Creates /dist folder

# Backend Deployment
cd backend
gunicorn -w 4 "app:create_app()"
```

### Environment Variables

Frontend (`.env`):
```
VITE_API_BASE_URL=/semblance_back/api
```

Backend:
```
GEMINI_API_KEY=your_api_key_here
MONGODB_URI=mongodb://localhost:27017/synthetic_society
JWT_SECRET_KEY=your_secret_key
```

## Security Considerations

1. **Authentication**: JWT-based authentication with token expiration
2. **Authorization**: Route-level protection with `@jwt_required` decorator
3. **Data Validation**: Input validation on both frontend (Zod) and backend
4. **CORS**: Configured for specific origins in production
5. **API Keys**: Environment variables for sensitive configuration
6. **Password Security**: Bcrypt hashing for password storage
7. **Session Management**: Automatic logout on token expiration

## Key Features

### Persona Management
- **Manual Creation**: Detailed form with 50+ attributes
- **AI Generation**: Bulk creation with customizable parameters
- **Persona Profiles**: Comprehensive view with attitudinal profile methodology
- **Folder Organization**: Group personas for easy management
- **Export/Import**: Download personas for backup or sharing

### Focus Group Sessions
- **Discussion Guide Editor**: Structured session planning
- **AI Moderation**: Autonomous or semi-autonomous modes
- **Real-time Participation**: Live conversation with AI personas
- **Theme Extraction**: Automatic identification of key insights
- **Note Taking**: Time-stamped notes linked to messages
- **Analytics Dashboard**: Visual representation of participation and sentiment

### Data Analysis
- **Export Options**: PDF reports, CSV data, JSON backups
- **Theme Management**: Manual and AI-generated theme tracking
- **Conversation History**: Full transcript with highlighting
- **Reasoning Transparency**: View AI decision-making process

## Best Practices for Development Team

1. **Code Organization**: Follow the existing pattern of separating concerns (components, services, types)
2. **Type Safety**: Maintain TypeScript types for all data structures
3. **Error Handling**: Use try-catch blocks with user-friendly toast notifications
4. **API Consistency**: Follow RESTful conventions for new endpoints
5. **Component Reusability**: Utilize shadcn-ui components and create custom wrappers
6. **State Management**: Keep state as local as possible, lift only when necessary
7. **Performance**: Implement pagination for large datasets
8. **Testing**: Add unit tests for critical business logic
9. **Documentation**: Update API documentation when adding new endpoints
10. **Security**: Always validate user input and sanitize data

## Future Enhancements

Based on the codebase analysis, potential areas for enhancement include:

1. **Real-time Updates**: WebSocket integration for live session updates
2. **Advanced Analytics**: More detailed sentiment analysis and reporting
3. **Multi-language Support**: Internationalization for global research
4. **Team Collaboration**: Multiple users per focus group session
5. **Template Library**: Pre-built discussion guides and persona archetypes
6. **API Rate Limiting**: Implement rate limiting for AI endpoints
7. **Caching Layer**: Redis for frequently accessed data
8. **Audit Logging**: Track all user actions for compliance
9. **Backup System**: Automated database backups
10. **Performance Monitoring**: Integration with monitoring tools