Instead of threads, use overlapped (asynchronous) I/O to read from both
stdout and stderr. Like in d0643fa, stdout and stderr could be closed at
different times, so a sparse_wait function is added to wrap
WaitForMultipleObjects and skip NULL handles.