updated readme

This commit is contained in:
michael 2025-09-18 14:31:35 -05:00
parent 3008d8f8fc
commit ba0391bbc6

269
README.md
View file

@ -1,142 +1,243 @@
# Video Query Tool
This application processes videos using Google's Gemini AI model, allowing users to:
A full-stack web application that processes videos using Google's Gemini AI model, allowing users to upload videos and receive AI-generated content based on customizable prompts. The application features Azure AD B2C authentication, chunked file uploads for large videos, PDF generation with Mermaid diagram support, and comprehensive usage tracking.
1. Upload videos (MP4, AVI, MOV, etc.)
2. Choose from preset processing modes or use custom prompts
3. Get AI-generated markdown content based on the video content
## Features
## Important Notes
### Core Functionality
- **Video Processing**: Upload and analyze videos using Google Gemini 2.5 Pro AI model
- **Multiple Processing Modes**:
- Meeting Summary
- Process/Tool Documentation
- Process Documentation with Mermaid Charts
- Custom Prompts
- **Large File Support**: Chunked upload system supporting files up to 5GB
- **PDF Generation**: Convert results to PDF with embedded Mermaid diagrams
- **Authentication**: Azure AD B2C integration with both popup and redirect flows
- **Video Length Limitation**: The Gemini AI model can only process videos up to 55 minutes in length.
- **File Size**: The application supports uploads up to 5GB.
### Technical Features
- **Drag & Drop Upload**: Modern file upload interface with progress tracking
- **Real-time Processing**: Live status updates during video analysis
- **Error Handling**: Comprehensive error handling and user feedback
- **Usage Analytics**: Automated tracking via webhook integration
- **Production Ready**: Systemd service configuration and deployment scripts
## Limitations
- **Video Length**: Gemini AI processes videos up to 55 minutes maximum
- **File Size**: Application supports uploads up to 5GB
- **Supported Formats**: MP4, AVI, MOV, WMV, MKV, WEBM
## Project Structure
```
video_query/
├── backend/ # Flask/Hypercorn server
│ ├── app.py # Main Flask application
│ ├── video_processor.py # Video processing logic
│ └── run.py # Hypercorn server script
└── frontend/ # React frontend
├── public/ # Static assets
└── src/ # React source code
├── backend/ # Flask/Hypercorn API server
│ ├── app.py # Main Flask application with PDF generation
│ ├── video_processor.py # Gemini API integration and video processing
│ ├── auth.py # Azure AD B2C authentication handlers
│ ├── chunked_upload.py # Chunked file upload Blueprint
│ ├── run.py # Hypercorn production server
│ ├── requirements.txt # Python dependencies
│ └── test_*.py # API testing utilities
├── frontend/ # React SPA
│ ├── src/
│ │ ├── components/ # React components
│ │ │ ├── VideoUpload.js # Drag & drop file upload
│ │ │ ├── PromptSelector.js # Mode selection and prompt editing
│ │ │ ├── ResultDisplay.js # Results with PDF generation
│ │ │ ├── AuthenticatedContent.js # Main application interface
│ │ │ └── Login.js # Authentication interface
│ │ ├── auth/ # Authentication utilities
│ │ │ ├── authConfig.js # Azure AD B2C configuration
│ │ │ ├── AuthProvider.js # MSAL React provider
│ │ │ └── authApiClient.js # Authenticated API client
│ │ └── utils/
│ │ └── chunkedUploader.js # Large file upload handler
│ ├── package.json # Node.js dependencies
│ └── build/ # Production build output
├── DEPLOYMENT.md # Production deployment instructions
├── LOG_EXTRACTION_README.md # Usage analytics documentation
├── restart.sh # Development restart script
├── quick_extract.sh # Log extraction utility
├── extract_user_logs*.sh # Advanced log processing
└── requirements.txt # Root Python dependencies (legacy)
```
## Dependencies
### Backend Dependencies
- **Flask 3.1.0**: Web framework
- **google-generativeai 0.8.5**: Gemini AI API client
- **Hypercorn 0.17.3**: ASGI production server
- **python-jose**: JWT token validation for Azure AD
- **flask-cors 5.0.1**: Cross-origin resource sharing
- **pdfkit 1.0.0**: PDF generation from HTML
- **cairosvg 2.8.0**: SVG to PNG conversion for diagrams
- **Pillow 11.2.1**: Image processing
- **python-dotenv 1.1.0**: Environment variable management
### Frontend Dependencies
- **React 18.2.0**: UI framework
- **@azure/msal-react 3.0.12**: Microsoft Authentication Library
- **axios 1.6.0**: HTTP client
- **bootstrap 5.3.2**: UI components and styling
- **mermaid 11.6.0**: Diagram generation
- **react-dropzone 14.2.3**: File upload interface
- **showdown 2.1.0**: Markdown to HTML conversion
## Setup Instructions
### Prerequisites
- Python 3.8+
- Node.js 16+
- Google Cloud API key with Gemini access
- Azure AD B2C tenant (for authentication)
- wkhtmltopdf (for PDF generation)
### Backend Setup
1. Create and activate a virtual environment:
```
1. **Create and activate virtual environment**:
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
2. Install backend dependencies:
```
pip install -r requirements.txt
2. **Install dependencies**:
```bash
pip install -r backend/requirements.txt
```
3. Set your Google API key:
```
export GOOGLE_API_KEY=your_api_key_here
3. **Set up environment variables**:
```bash
export GOOGLE_API_KEY="your_gemini_api_key_here"
```
4. Run the development server:
4. **Install system dependencies for PDF generation**:
```bash
# Ubuntu/Debian:
sudo apt-get install wkhtmltopdf python3-cairo libcairo2-dev
# macOS:
brew install cairo wkhtmltopdf
```
5. **Start development server**:
```bash
cd backend
python run.py
python run.py --host 0.0.0.0 --port 5010
```
### Frontend Setup
1. Install Node.js dependencies:
```
1. **Install Node.js dependencies**:
```bash
cd frontend
npm install
```
2. Start the development server:
```
2. **Configure authentication** (edit `src/auth/authConfig.js`):
- Update Azure AD B2C tenant ID
- Update client ID
- Update redirect URIs
3. **Start development server**:
```bash
npm start
```
## Deployment
## Production Deployment
### Backend Deployment with Systemd
### System Requirements
- Ubuntu/CentOS server
- Apache/Nginx web server
- Python 3.8+ with virtual environment
- wkhtmltopdf system package
- Node.js for building frontend
1. Update the systemd service file (`backend/video-query.service`):
- Update paths to match your server
- Add your GOOGLE_API_KEY
- Place in `/etc/systemd/system/`
### Backend Deployment
2. Enable and start the service:
1. **Install system packages**:
```bash
sudo apt-get update
sudo apt-get install -y wkhtmltopdf python3-cairo python3-pil libcairo2-dev
```
2. **Create production service** (see `DEPLOYMENT.md` for systemd configuration):
```bash
sudo systemctl enable video-query
sudo systemctl start video-query
```
3. Check the service status:
```
sudo systemctl status video-query
```
### Frontend Deployment
### Frontend Deployment with Apache
1. Build the React frontend:
```
1. **Build for production**:
```bash
cd frontend
npm run build
PUBLIC_URL=/video_query npm run build
```
2. Copy the build directory to your Apache document root:
```
2. **Deploy to web server**:
```bash
cp -r build/* /var/www/html/video-query/
```
3. Configure Apache to serve the React app, adding the following to your Apache configuration:
```
<VirtualHost *:80>
ServerName yourdomain.com
DocumentRoot /var/www/html/video-query
<Directory "/var/www/html/video-query">
AllowOverride All
Require all granted
# Redirect all requests to index.html for React routing
RewriteEngine On
RewriteBase /
RewriteRule ^index\.html$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.html [L]
</Directory>
# Proxy API requests to the backend
ProxyPass /api http://localhost:5010/api
ProxyPassReverse /api http://localhost:5010/api
</VirtualHost>
```
4. Restart Apache:
```
sudo systemctl restart apache2
```
## API Reference
The backend API exposes a single endpoint:
### Authentication Endpoints
- **GET /api/auth-test**: Verify authentication status
- **POST /api/process**: Processes an uploaded video with the specified prompt
- Form parameters:
- `video`: The video file
- `prompt`: The prompt text to process the video with
- Returns:
- Success: `{ "success": true, "content": "markdown content..." }`
- Error: `{ "success": false, "message": "error message..." }`
### Video Processing Endpoints
- **POST /api/process**: Main video processing endpoint
- Accepts both direct uploads and chunked upload references
- Form data: `video` file, `prompt` text
- JSON data: `file_path`, `filename`, `prompt` (for chunked uploads)
### Chunked Upload Endpoints
- **POST /api/init-upload**: Initialize chunked upload session
- **POST /api/upload-chunk/<upload_id>**: Upload file chunk
- **POST /api/complete-upload/<upload_id>**: Mark upload complete
- **POST /api/cancel-upload/<upload_id>**: Cancel upload
### PDF Generation Endpoints
- **POST /api/generate-pdf**: Generate PDF from HTML with Mermaid diagrams
- JSON data: `html`, `textDiagrams`, `svgDiagrams`, `diagramPngs`
## Usage Analytics
The application includes built-in usage tracking that sends data to a webhook endpoint for analytics purposes. This tracks:
- User email addresses
- Processing timestamps
- Prompts used
- Model information
Log extraction utilities are provided in `extract_user_logs*.sh` scripts.
## Configuration Files
### Key Configuration Files
- **CLAUDE.md**: Development guidelines and build commands
- **.gitignore**: Comprehensive exclusion patterns
- **backend/requirements.txt**: Production Python dependencies
- **frontend/package.json**: Node.js dependencies and build scripts
### Environment Variables
- `GOOGLE_API_KEY`: Required for Gemini API access
- Various Azure AD B2C configuration in frontend auth config
## Development Utilities
- **restart.sh**: Quick development environment restart
- **backend/test_*.py**: API testing and validation scripts
- **backend/run.py**: Production server with optimized settings for large uploads
## Security Features
- Azure AD B2C integration with JWT validation
- CORS protection with specific origin allowlisting
- Secure file upload validation
- Temporary file cleanup
- Token expiration handling
## License