Commit Graph

60 Commits

Author SHA1 Message Date
Sebastian Nagel
9d577dac57 Extract links from all frames attached to a page, fixes #45 (#48) 2021-04-30 08:41:00 -07:00
Ilya Kreymer
eff4c61270 misc typos/fixes for 0.3.0:
- update README with latest params
- ensure capture dir includes seconds
- bump behaviors to 0.1.1
2021-04-13 18:17:44 -07:00
Ilya Kreymer
b59788ea04 Profiles: Support for running with existing profiles + saving profile after a login (#34)
Support for profiles via a mounted .tar.gz and --profile option + improved docs #18

* support creating profiles via 'create-login-profile' command with options for where to save profile, username/pass and debug screenshot output. support entering username and password (hidden) on command-line if omitted.

* use patched pywb for fix

* bump browsertrix-behaviors to 0.1.0

* README: updates to include better getting started, behaviors and profile reference/examples

* bump version to 0.3.0!
2021-04-10 13:08:22 -07:00
Emma Dickson
24e2c4ddf8 Create --combineWARC flag that combines generated warcs into a single warc upto rollover size (#33)
* generates combined WARCs in collection root directory with suffix `_0.warc`, `_1.warc`, etc..
* each combined WARC limited by the size in `--rolloverSize`, if exceeds a new WARC is created, otherwise appended to previous WARC.
* add test for --combineWARC flag
* add improved lint rules

Co-authored-by: Emma Dickson <emmadickson@Emmas-MacBook-Pro.local>
2021-03-31 10:41:27 -07:00
Emma Dickson
748b0399e9 add text extraction (#28)
* add text extraction via --text flag

* update readme with --text and --generateWACZ flags

Co-authored-by: Emma Dickson <emmadickson@Emmas-MacBook-Pro.local>
2021-02-23 13:52:54 -08:00
raffaele messuti
5bf64be018 minor fixes (#1)
* Update README.md - fix incomplete docker run pywb

* Update crawler.js - fix generateCDX
2020-11-03 13:33:19 -08:00
Ilya Kreymer
8f740d4e24 support custom crawl directory with --cwd flag, default to /crawls
update README
2020-11-02 15:28:19 +00:00
Ilya Kreymer
e2bce2f30d README tweaks 2020-11-01 21:43:52 -08:00
Ilya Kreymer
a875aa90d3 Dockerfile: switch to cmd 'crawl', instead of entrypoint to support running 'pywb' also
update README with docker-compose and docker run examples, update commandline example
default output to './crawls' subdirectory
2020-11-01 21:35:00 -08:00
Ilya Kreymer
ded83b52b3 initial commit after split from zimit 2020-10-31 13:16:37 -07:00