vo_gpu: implement error diffusion for dithering

This is a straightforward parallel implementation of error diffusion
algorithms in compute shader. Basically we use single work group with
maximal possible size to process the whole image. After a shift
mapping we are able to process all pixels column by column.

A large ring buffer are allocated in shared memory to speed things up.
However the size of required shared memory depends linearly on the
height of video window (or screen height in fullscreen mode). In case
there is no enough shared memory, it will fallback to `--dither=fruit`.

The maximal allowed work group size is hardcoded as 1024. Ideally we
could query `GL_MAX_COMPUTE_WORK_GROUP_INVOCATIONS`. But for whatever
reason, it seems most high end card from nvidia and amd support only
the minimal required value, so I guess we can stick to it for now.
This commit is contained in:
Bin Jin
2019-03-16 11:19:51 +00:00
committed by sfan5
parent 6aecd10eba
commit ca2f193671
6 changed files with 454 additions and 1 deletions

View File

@@ -4346,10 +4346,16 @@ The following video options are currently all specific to ``--vo=gpu`` and
Used in ``--dither=fruit`` mode only.
``--dither=<fruit|ordered|no>``
``--dither=<fruit|ordered|error-diffusion|no>``
Select dithering algorithm (default: fruit). (Normally, the
``--dither-depth`` option controls whether dithering is enabled.)
The ``error-diffusion`` option requires compute shader support. It also
requires large amount of shared memory to run, the size of which depends on
both the kernel (see ``--error-diffusion`` option below) and the height of
video window. It will fallback to ``fruit`` dithering if there is no enough
shared memory to run the shader.
``--temporal-dither``
Enable temporal dithering. (Only active if dithering is enabled in
general.) This changes between 8 different dithering patterns on each frame
@@ -4362,6 +4368,29 @@ The following video options are currently all specific to ``--vo=gpu`` and
``--temporal-dither`` is in use. 1 (the default) will update on every video
frame, 2 on every other frame, etc.
``--error-diffusion=<kernel>``
The error diffusion kernel to use when ``--dither=error-diffusion`` is set.
``simple``
Propagate error to only two adjacent pixels. Fastest but low quality.
``sierra-lite``
Fast with reasonable quality. This is the default.
``floyd-steinberg``
Most notable error diffusion kernel.
``atkinson``
Looks different from other kernels because only fraction of errors will
be propagated during dithering. A typical use case of this kernel is
saving dithered screenshot (in window mode). This kernel produces
slightly smaller file, with still reasonable dithering quality.
There are other kernels (use ``--error-diffusion=help`` to list) but most of
them are much slower and demanding even larger amount of shared memory.
Among these kernels, ``burkes`` achieves a good balance between performance
and quality, and probably is the one you want to try first.
``--gpu-debug``
Enables GPU debugging. What this means depends on the API type. For OpenGL,
it calls ``glGetError()``, and requests a debug context. For Vulkan, it