- Fixed Python 3.14 compatibility (switched to Python 3.12) - Upgraded Anthropic SDK to v0.75.0 - Updated Claude model to Sonnet 4.5 (claude-sonnet-4-5-20250929) - Fixed Google Sheets hyperlink extraction using Sheets API v4 - Extracts rich text hyperlinks from cells correctly - Fixed WeasyPrint PDF generation (upgraded to v67.0) - Fixed Jinja2 template naming collision (items -> articles) - Added extract_hyperlinks.py module for Sheets API v4 integration - Dual-track scraping: Firecrawl for regular URLs, Apify for social media - Newsletter-style PDF with Montserrat font - Complete documentation and setup guides - All components tested and working
4 KiB
Quick Setup Guide
Implementation Complete!
Your Newsroom Daily Report Generator is ready to use. Here's what's been built:
What It Does
- Reads today's URLs from your Google Sheet (column D for dates)
- Classifies URLs (social media vs regular websites)
- Scrapes content:
- Firecrawl for news articles/blogs
- Apify for Twitter/X, Instagram, TikTok, LinkedIn
- Summarizes with Claude API (title + 2-3 bullets per article)
- Generates beautiful newsletter-style PDF with Montserrat font
Categories Supported
- HARD NEWS
- POP CULTURE
- PRODUCT SPOTTING
- INTERNET CULTURE
- INDUSTRY NEWS
- SOCIAL UPDATES
- INSPIRATION
Next Steps to Get Running
1. Install System Dependencies
macOS (you're on this):
brew install cairo pango gdk-pixbuf libffi
2. Set Up Google Sheets Access
You need to create a Google Service Account:
- Go to Google Cloud Console
- Create/select project
- Enable "Google Sheets API"
- Create Service Account
- Download JSON key and save as
service_account.jsonin this directory - Important: Share your Google Sheet with the service account email
Detailed instructions in README.md
3. Add Your Anthropic API Key
Edit the .env file:
nano .env
Replace your-anthropic-api-key-here with your actual API key.
4. Install Python Dependencies
# Activate virtual environment (already created)
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
5. Test Run
# Make sure venv is activated
source venv/bin/activate
# Run the generator
python newsroom_report.py
What You Have
Files Created
newsroom_report.py- Main script (run this daily)config.py- Configuration managementscraper.py- Firecrawl integrationapify_scrapers.py- Social media scrapingcontent_processor.py- URL classificationsummarizer.py- Claude AI summarizationpdf_generator.py- PDF generationtemplates/- HTML/CSS for newsletter designstatic/montserrat/- Montserrat font filesrequirements.txt- Python dependencies.env- Configuration (edit this with your Anthropic key)README.md- Complete documentation
Git Repository
Already initialized and ready to push:
# Push to Bitbucket when ready (using SSH key djp1971)
git push -u origin master
Costs
Monthly estimates for 20 URLs/day (600/month):
- Firecrawl: ~$20-30
- Apify: ~$15-50
- Claude API: ~$10-20
- Google Sheets: Free
- Total: ~$45-100/month
Support Resources
- README.md - Full documentation
- Troubleshooting section - Common issues and solutions
- Test commands - Verify each component works
Daily Usage
Once set up, just run:
cd /Users/daveporter/Desktop/CODING-2024/newsroom-reporter
source venv/bin/activate
python newsroom_report.py
PDF will be saved to: reports/Newsroom_Report_YYYY-MM-DD.pdf
Key Configuration
Already set in .env.example and copied to .env:
- Google Sheet ID:
1vGSZIST0ruKdYRGSgNz1W8AueQGFHHbZ7D6zXFVNKeA - Firecrawl API Key:
fc-3dfbb10dca12469998ad9e0db490d622 - Apify API Key:
apify_api_61KN8cz07owBqcFAcfcSdPWMwAJEAm3julCF - You need to add: Anthropic API Key
Notes
- Date format in sheet: "Tuesday, January 6" (no year)
- Dates should be in Column D
- Script automatically finds today's date
- Social media scraping via Apify (reliable for all platforms)
- Falls back gracefully if any URL fails to scrape
Architecture Highlights
Smart URL Classification
Automatically detects and routes URLs to the appropriate scraper (social vs regular).
Parallel Processing
Scrapes multiple URLs efficiently using batch operations where possible.
Error Handling
Graceful fallbacks for failed scrapes - continues processing other URLs.
Newsletter Design
Professional PDF with:
- Gradient header
- Category sections with colored borders
- Article cards with bullet points
- Clean typography with Montserrat font
- Source links for each article
Enjoy your automated newsroom reports!