mirror of
https://github.com/webrecorder/browsertrix-crawler.git
synced 2025-12-25 11:20:18 +00:00
* interrupts: simplify interrupt behavior: - SIGTERM/SIGINT behave same way, trigger an graceful shutdown after page load improvements of remote state / parallel crawlers (for browsertrix-cloud): - SIGUSR1 before SIGINT/SIGTERM ensures data is saved, mark crawler as done - for use with graceful stopping crawl - SIGUSR2 before SIGINT/SIGTERM ensures data is saved, does not mark crawler as done - for use with scaling down a single crawler * scope check: check scope of URL retrieved from queue (in case scoping rules changed), urls matching seed automatically in scope!