browsertrix-crawler/tests/dedup-basic.test.js at e0244391f12234f569f76a159b4d668fc3d4edbd

mirror of https://github.com/webrecorder/browsertrix-crawler.git synced 2025-12-25 11:20:18 +00:00

Files

Ilya Kreymer e0244391f1 update to new data model:

- hashes stored in separate crawl specific entries, h:<crawlid>
- wacz files stored in crawl specific list, c:<crawlid>:wacz
- hashes committed to 'alldupes' hashset when crawl is complete, crawls added to 'allcrawls' set
- store filename, crawlId in related.requires list entries for each wacz

2025-12-11 10:43:57 -08:00

4.6 KiB

Raw Blame History

View Raw

4.6 KiB Raw Blame History

4.6 KiB

Raw Blame History