Page Resource Block Rules Avoid Duplicate Handlers + Ignore top-level pages + README update (0.4.4) (#81)

* blockrules improvements:
- add await to continue/abort to catch errors, each called only in one place.
- avoid adding multiple interception handlers for same page to avoid 'request already handled' errors
- disallow blocking full pages via blockRules (should be handled via scope exclusion) and print warning

* setup: ensure the 'cwd' for the crawl output exists on startup, in case a custom cwd was set.

* scopeType rename:
- rename 'page' -> page-spa to indicate support for hashtag / single-page-app intended usage
- rename 'none' -> page to indicate default single-page-only crawl
- messaging: adjust error message displaying valid scopeTypes

* README: Add additional examples for scope rules, update scopeType param, explain different between scope rules vs block rules, to better address confusion as per #80

bump to 0.4.4
This commit is contained in:
Ilya Kreymer
2021-08-17 20:54:18 -07:00
committed by GitHub
parent 4033c52693
commit c5494be653
8 changed files with 181 additions and 73 deletions

View File

@@ -1,3 +1,3 @@
pywb>=2.6.0b4
pywb>=2.6.0
uwsgi
wacz>=0.3.1