Files
browsertrix-crawler/Dockerfile
Ilya Kreymer 5ee05985b1 Use VNC for headful profile creation (#197)
* profiles: use vnc for automatic profile creation (fixes #194):
- add x11vnc and serve via vnc when not headless, keep existing screencast for headless mode
- use @novnc/novnc to serve vnc JS library
- add novnc_lite.html to serve the content from an iframe
- optimization: don't show initial blank page / don't wait for initial page in puppeteer

* more vnc work:
- set position of browser at 0,0, avoid needing offset to fit
- add /vncpass endpoint to query vnc password (for use with browsertrix-cloud)
- remove websockify, x11vnc now supports ws connections directly!
- vnc_lite: support reconnecting ws if gracefully disconnected

* x11vnc cleanup: just pass password via cmdline to simplify setup

* make interactive profile creation default, automated enabled only if --automated or --username / --password flags are specified
README updates:
- mention new VNC-based streaming
- mention new --automated flag, move automated info below interactive

* README: adjust auto-login example to use mastodon example instead of twitter, which works more consistently
2023-01-09 23:56:53 -08:00

58 lines
1.5 KiB
Docker

ARG BROWSER_IMAGE_BASE=webrecorder/browsertrix-browser-base
ARG BROWSER_VERSION=105
FROM ${BROWSER_IMAGE_BASE}:${BROWSER_VERSION}
# TODO: Move this into base image
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -qqy jq x11vnc
# needed to add args to main build stage
ARG BROWSER_VERSION
ENV PROXY_HOST=localhost \
PROXY_PORT=8080 \
PROXY_CA_URL=http://wsgiprox/download/pem \
PROXY_CA_FILE=/tmp/proxy-ca.pem \
DISPLAY=:99 \
GEOMETRY=1360x1020x16 \
BROWSER_VERSION=${BROWSER_VERSION} \
BROWSER_BIN=google-chrome \
OPENSSL_CONF=/app/openssl.conf \
VNC_PASS=vncpassw0rd!
WORKDIR /app
ADD requirements.txt /app/
RUN pip install 'uwsgi==2.0.20'
RUN pip install -U setuptools; pip install -r requirements.txt
ADD package.json /app/
# to allow forcing rebuilds from this stage
ARG REBUILD
# Download and format ad host blocklist as JSON
RUN mkdir -p /tmp/ads && cd /tmp/ads && \
curl -vs -o ad-hosts.txt https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts && \
cat ad-hosts.txt | grep '^0.0.0.0 '| awk '{ print $2; }' | grep -v '0.0.0.0' | jq --raw-input --slurp 'split("\n")' > /app/ad-hosts.json && \
rm /tmp/ads/ad-hosts.txt
RUN yarn install
ADD *.js /app/
ADD util/*.js /app/util/
ADD config/ /app/
ADD html/ /app/html/
RUN ln -s /app/main.js /usr/bin/crawl; ln -s /app/create-login-profile.js /usr/bin/create-login-profile
WORKDIR /crawls
ADD docker-entrypoint.sh /docker-entrypoint.sh
ENTRYPOINT ["/docker-entrypoint.sh"]
CMD ["crawl"]