json-parser-twist-metaserver/INSTALL_GUIDE.md
Dave Porter af8acbd986 Add all project files including previous versions and documentation
- Added INSTALL_GUIDE.md and README.md documentation
- Added OLD/ folder with previous script versions for reference
- Added data/ folder with sample JSON test files
- Added older json_workflow_processor-hybrid-protected.py version
- Excludes venv and .DS_Store (per .gitignore)

Complete project backup with full history and test data.

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
2026-02-06 11:07:22 -05:00

11 KiB

JSON Workflow Processor - Installation Guide

A comprehensive guide to install and configure the JSON Workflow Processor on your production system.

📋 Prerequisites

  • Python 3.7+ installed on the system
  • pip (Python package manager)
  • Root/sudo access for directory creation
  • Network access to Mailgun SMTP servers
  • Valid Mailgun account with SMTP credentials

📦 Available Versions

Choose the version that best fits your needs:

Version File Description
Single-threaded json_workflow_processor-local.py Basic version, processes one file at a time
Batch Processing json_workflow_processor-batch.py High-performance, 5-10 files concurrently
Reporting json_workflow_processor-reporting.py Batch processing + daily reports
Startup Scan json_workflow_processor-startup-scan.py Reporting + processes existing files on startup
Hybrid json_workflow_processor-hybrid.py Startup scan + periodic scanning backup
Protected Hybrid json_workflow_processor-hybrid-protected.py RECOMMENDED - Most robust production version

🚀 Quick Installation

1. Create Installation Directory

# Create installation directory
sudo mkdir -p /opt/json-workflow-processor
cd /opt/json-workflow-processor

# Download files (copy from your development environment)
# - json_workflow_processor-hybrid-protected.py (RECOMMENDED)
# - requirements.txt

2. Install Python Dependencies

# Install required packages
pip3 install -r requirements.txt

# Or install manually:
pip3 install watchdog>=3.0.0 schedule>=1.2.0

3. Create Required Directories

# Create production directories
sudo mkdir -p /data/PRODUCTION/JSON
sudo mkdir -p /data/PRODUCTION/JSON_STORE
sudo mkdir -p /data/PRODUCTION/JSON_FAILED
sudo mkdir -p /data/PRODUCTION/SYNC/MAKE
sudo mkdir -p /PRODUCTION/JSON_PARSER_LOGS

# Create destination folders
sudo mkdir -p "/data/PRODUCTION/SYNC/MAKE/Celtra - Create_Rename - Project_Design File"
sudo mkdir -p "/data/PRODUCTION/SYNC/MAKE/Monday RB"
sudo mkdir -p "/data/PRODUCTION/SYNC/MAKE/Monday Rank"

# Set permissions to 777 for production use (if running as root)
sudo chmod -R 777 /data/PRODUCTION/
sudo chmod -R 777 /PRODUCTION/JSON_PARSER_LOGS/

# Note: The processor automatically sets 777 permissions on processed files
# when running as root to ensure proper access across the system

4. Configure Client Hot Folders

# Create client hot folders (examples - add your actual clients)
sudo mkdir -p /data/PRODUCTION/JSON/RANK
sudo mkdir -p /data/PRODUCTION/JSON/RECKITTBENCKISER
sudo mkdir -p /data/PRODUCTION/JSON/ADIDAS
sudo mkdir -p /data/PRODUCTION/JSON/CIBC
sudo mkdir -p /data/PRODUCTION/JSON/OLIVER
sudo mkdir -p /data/PRODUCTION/JSON/PAYPAL
sudo mkdir -p /data/PRODUCTION/JSON/BAYER
sudo mkdir -p /data/PRODUCTION/JSON/3M

# Add any additional client folders as needed
# sudo mkdir -p /data/PRODUCTION/JSON/YOUR_CLIENT_NAME

⚙️ Configuration

1. Update File Paths (if needed)

Edit json_workflow_processor-hybrid-protected.py and verify the paths in the Config class:

class Config:
    # Paths
    HOT_FOLDER = Path("/data/PRODUCTION/JSON")
    JSON_STORE = Path("/data/PRODUCTION/JSON_STORE/")
    JSON_FAILED = Path("/data/PRODUCTION/JSON_FAILED/")
    SYNC_BASE = Path("/data/PRODUCTION/SYNC/MAKE")
    REPORTS_DIR = Path("/PRODUCTION/JSON_PARSER_LOGS")
    
    # Monitoring settings
    PERIODIC_SCAN_INTERVAL = 60    # Scan every 60 seconds for missed files
    PERIODIC_SCAN_TIMEOUT = 120    # Max scan duration before reset
    SLOW_SCAN_THRESHOLD = 30       # Log warning if scan exceeds this
    
    # File permissions (automatically set to 777 when running as root)
    # This ensures proper access across the production system

2. Configure Email Recipients

Add email addresses to receive daily reports:

# Email settings
REPORT_EMAILS = [
    "daveporter@oliver.agency",
    "additional@email.com",  # Add more recipients
    "manager@company.com"
]

3. Add Celtra-Eligible Clients

Update the list of clients that can have Celtra projects:

CELTRA_ELIGIBLE_CLIENTS = {
    "CIBC", "OLIVER", "ADIDAS", "PAYPAL", 
    "RECKITTBENCKISER", "BAYER", "3M", "RANK",
    # Add new clients here:
    # "NEW_CLIENT_NAME"
}

🔧 Testing

1. Test Email Configuration

# Test Mailgun email delivery
python3 test_email.py

Expected output:

✅ Email sent successfully!
Check daveporter@oliver.agency for the test message.

2. Test File Processing

# Create a test JSON file
cat > /data/PRODUCTION/JSON/RANK/test.json << 'EOF'
{
  "JobSpecification": {
    "JobDetails": {
      "JobCategory": "Celtra",
      "StudioCode": "RANK_STUDIO",
      "Title": "Test Project",
      "ClientCode": "RANK"
    }
  }
}
EOF

# Run the processor (it will process the file and continue monitoring)
python3 json_workflow_processor-hybrid-protected.py

Expected behavior:

  • Startup scan: Finds and processes the test file
  • File routing: Copied to both Monday Rank/ and Celtra - Create_Rename - Project_Design File/
  • Storage: Stored in JSON_STORE/RANK/
  • Cleanup: Original file deleted
  • Monitoring: Continues monitoring for new files with 60s periodic scans
  • Logging: Activity logged to /PRODUCTION/JSON_PARSER_LOGS/json_workflow_hybrid_protected.log
  • Client tracking: Log entries show client folder names (e.g., "Processing: RANK/test.json")

🔄 Service Setup (Production)

Create a systemd service file:

sudo tee /etc/systemd/system/json-workflow.service > /dev/null << 'EOF'
[Unit]
Description=JSON Workflow Processor
After=network.target

[Service]
Type=simple
User=production
Group=production
WorkingDirectory=/opt/json-workflow-processor
ExecStart=/usr/bin/python3 json_workflow_processor-hybrid-protected.py
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target
EOF

# Reload systemd and start service
sudo systemctl daemon-reload
sudo systemctl enable json-workflow.service
sudo systemctl start json-workflow.service

# Check status
sudo systemctl status json-workflow.service

Option 2: Screen Session (For Testing)

# Start in screen session
screen -S json-workflow
python3 json_workflow_processor-hybrid-protected.py

# Detach with Ctrl+A, D
# Reattach with: screen -r json-workflow

📊 Monitoring & Logs

Real-time Monitoring

# Watch application logs
tail -f /PRODUCTION/JSON_PARSER_LOGS/json_workflow_hybrid_protected.log

# Watch systemd logs (if using systemd)
sudo journalctl -u json-workflow.service -f

Daily Reports

Reports are automatically:

  • Generated at midnight (00:00)
  • Saved to /PRODUCTION/JSON_PARSER_LOGS/daily_report_YYYY-MM-DD.txt
  • Emailed to configured recipients
  • Old reports cleaned up after 30 days

Performance Statistics

The processor logs comprehensive statistics every minute:

Stats: Today: 247 processed (startup: 50) (periodic: 5) (failed: 2) (scans: 145/3), 2 errors, 99.2% success

Key Features:

  • Startup scan: Processes existing files on restart (crash recovery)
  • Periodic scanning: Backup 60s scans catch missed files
  • Failed file handling: Problematic files moved to /data/PRODUCTION/JSON_FAILED/
  • Protected scanning: Prevents overlapping scans and monitors performance
  • File versioning: Handles updated files correctly (same name, different content)
  • Client folder logging: Log messages include client folder names (e.g., "Processing: RANK/5903771.json")
  • Enhanced monitoring: Scan duration tracking with alerts for slow scans
  • Permission handling: Automatic 777 permission setting for files when running as root

🔧 Maintenance

Adding New Clients

  1. Create hot folder:

    sudo mkdir -p /data/PRODUCTION/JSON/NEW_CLIENT
    
  2. If client can have Celtra projects, update config:

    CELTRA_ELIGIBLE_CLIENTS = {
        "CIBC", "OLIVER", "ADIDAS", "PAYPAL", 
        "RECKITTBENCKISER", "BAYER", "3M", "RANK",
        "NEW_CLIENT"  # Add here
    }
    
  3. Restart service:

    sudo systemctl restart json-workflow.service
    

Log Rotation

Set up log rotation to prevent disk space issues:

sudo tee /etc/logrotate.d/json-workflow > /dev/null << 'EOF'
/PRODUCTION/JSON_PARSER_LOGS/*.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    create 0644 production production
    postrotate
        sudo systemctl reload json-workflow.service
    endscript
}
EOF

🚨 Troubleshooting

Common Issues

Email not sending:

# Test email configuration
python3 test_email.py

# Check Mailgun credentials in config

Files not processing:

# Check directory permissions
ls -la /data/PRODUCTION/JSON/

# Check application logs
tail -n 50 /PRODUCTION/JSON_PARSER_LOGS/json_workflow_reporting.log

# Verify service is running
sudo systemctl status json-workflow.service

High memory usage:

# Reduce batch size in config
BATCH_SIZE = 5  # Instead of 10
MAX_WORKERS = 5  # Instead of 10

Performance Tuning

For high-volume environments:

# Increase batch processing
BATCH_SIZE = 20
MAX_WORKERS = 15
BATCH_TIMEOUT = 15  # Process faster

# Reduce delays
POLL_INTERVAL = 1
WAIT_DELAY = 2

For low-volume environments:

# Conservative settings
BATCH_SIZE = 5
MAX_WORKERS = 5
POLL_INTERVAL = 5

📞 Support

Logs Location:

  • Application: /PRODUCTION/JSON_PARSER_LOGS/json_workflow_hybrid_protected.log
  • Daily Reports: /PRODUCTION/JSON_PARSER_LOGS/daily_report_*.txt
  • Failed Files: /data/PRODUCTION/JSON_FAILED/[CLIENT]/
  • System: sudo journalctl -u json-workflow.service

Common Commands:

# Restart service
sudo systemctl restart json-workflow.service

# View recent logs
tail -n 100 /PRODUCTION/JSON_PARSER_LOGS/json_workflow_hybrid_protected.log

# Test email
python3 test_email.py

# Check disk space
df -h /data/PRODUCTION/ /PRODUCTION/

# Check failed files
ls -la /data/PRODUCTION/JSON_FAILED/*/

# Monitor periodic scan performance
grep "Periodic scan" /PRODUCTION/JSON_PARSER_LOGS/json_workflow_hybrid_protected.log

# View client-specific processing
grep "Processing:" /PRODUCTION/JSON_PARSER_LOGS/json_workflow_hybrid_protected.log | tail -20

Installation Checklist

  • Python 3.7+ installed
  • Dependencies installed (pip3 install -r requirements.txt)
  • All directories created with correct permissions
  • Email configuration tested successfully
  • Test file processed correctly
  • Service configured and running
  • Monitoring setup (logs accessible)
  • Client hot folders created
  • Log rotation configured

🎉 Installation Complete!

Your JSON Workflow Processor is ready to handle production workloads with automated daily reporting!