The /cancel route used to 409 with "No running pipeline" when the child
handle wasn't in this server process — which happens any time the server
restarts (deploy, OOM, manual restart) while a run is still listed as
non-terminal in the DB. Three reports stuck "running for 11h+" on prod
this morning had no recoverable handle and no UI path to clear them.
Cancel is now smart enough to handle both cases:
- Live child: SIGTERM the process group + mark failed "Cancelled by ...".
- Orphan (no live child): mark failed "Marked failed by ... (no running
process — likely orphaned by a server restart)".
- Already terminal: 409 (unchanged).
Plus a boot sweep in server/index.ts that marks every non-terminal
report failed on startup with "Orphaned by server restart". This is the
right default — if the server is alive but the row is non-terminal, by
definition no child is producing artefacts for it. Saves the user
clicking Cancel on each stale row after every deploy.
Also adds v2/README.md with an architecture ASCII diagram, a 10-stage
pipeline ASCII diagram, the auth/tenancy model, ops commands, common
pitfalls (UK->GB, APIFY_LIVE_APPROVED, budget caps, cost-event
overwrite, compose-name policy), and the three deliberate design
choices behind V2's shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>