Multiprocessing Tips and Tricks

Here are some tips related to multiprocessing/multithreading that may significantly speed up or automate your operations on Linux.

Compressing or Decompressing an XZ File

For tarballs that are compressed with xz, the compression and decompression can be accelaerated by passing the -T option to xz. For example,

XZ_DEFAULTS="-T ${NPROC_OR_0}" tar cJvf # or xJvf

-T 0 or -T $(nproc) would use all the available cores, which may overload your computer if there're also other jobs.

Using GNU Parallel

GNU Parallel is a tool for running multiple commands in parallel. For instance, to decompress all the zip files in subfolders, run

find . -name "*.zip" | parallel -j ${NPROC} 'cd {//} && unzip -o {/}'

For Shell Scripts, Spawn Process in Background and Wait for Them

For a shell script (we are using bash here for example), if there are multiple processes that can be run in parallel, you can spawn them in the background and wait for them to finish. The following snippet shows how we can do that to run one inference script per GPU:

#!/usr/bin/env bash

# this line kills all the child processes when the script is terminated (e.g. by Ctrl-C)
trap 'trap - SIGTERM && kill -- -$$' SIGINT SIGTERM

PIDS=()
current_gpu=0
for config_name in "${CONFIGS[@]}"; do
    CUDA_VISIBLE_DEVICES=$current_gpu \
    python "$CODEBASE/tools/test.py" "${INFERENCE_CONFIGS[$config_name]}" ... &
    PIDS+=($!)
    ((current_gpu++))
    if [ ${#PIDS[@]} -gt $((NGPUS-1)) ]; then
        wait "${PIDS[@]}"
        PIDS=()
        current_gpu=0
    fi
done

Python Multiprocessing

A number of use cases demand a multiprocessing version of an existing Python script, e.g. the downloading script of ScanNet. While it is more prefered to extract all the download links and then use wget to download it (see this link for discussion on the inefficiency of urlretrieve), one could also achieve this via a simple yet hacky way by applying multiprocessing.apply_async function:

def _download_single_url(url, out_file_tmp, out_file):
    urllib.request.urlretrieve(url, out_file_tmp)
    os.rename(out_file_tmp, out_file)

# use a global variable to store the results
async_results.add(
    global_thread_pool.apply_async(
        _download_single_url, (url, out_file_tmp, out_file)
    )
)
if len(async_results) > 4 * mp.cpu_count(): # tune this number
    while len(async_results) > mp.cpu_count() // 2: # and this
        for async_result in async_results.copy():
            if async_result.ready():
                async_result.get()
                async_results.remove(async_result)
        print(f"\r{len(async_results)} tasks remaining...", end="")
        time.sleep(2)
    print("\n")

Other Tips

  • For make, use -j$(nproc) to use all the available cores.