Upscale and interpolate video super-resolution using STARnet

Increase video resolution with an opensource machine learning algorithm for upscaling and interpolating video image frames using an automated command line script.

Bringing machine learning algorithms a step closer to usability.

Given a low-resolution video file, this script uses a machine-learning algorithm to increase (upscale) each frame’s resolution and optionally add (interpolate) an additional frame using information from the next frame. The workhorse is the Space-Time-Aware Multi-Resolution Video Enhancement (STARnet) algorithm from Haris, Shakhnarovich, & Ukita (2020) – all credit goes to the authors.

Synopsis:

To upscale a full video file:
>upscale input.vid

To upscale part of a video file, specify time span, a frame range, or a mixture of both. Use “-” to indicate start and end points, or “+” to indicate start and span. For example, assuming, a 10fps video, the following are all equivalent:
>upscale input.vid --scene=0:01:00-0:01:10
>upscale input.vid --scene=0:01:00+0:10
>upscale input.vid --scene=600-700
>upscale input.vid --scene=600+100
>upscale input.vid --scene=1:00.00+100

As part of the upscale process, an intermediate frame is generated between every pair of frames to help map information from one frame to the next, improving the results. Consequently, the intermediate frames can optionally be saved for interpolation (increasing the video’s frame rate):
>upscale input.vid --interpolate

If the upscale process is interrupted, then it is automatically resumed by re-running the command. To prevent this behaviour (clobber any existing output frames) use:
>upscale input.vid --no-resume

Frames are extracted from video files using ffmpeg, so any video file format supported by ffmpeg will work. However, if you would like to use a different program (eg, VLC), or prefer to handle the extraction manually, then just place the frames as a sequence of .png files in a directory:
>upscale input-frames/

The output directory name is auto-generated as .input.out/, or can be specified manually:
>upscale input.vid output-frames/
>upscale input-frames/ output-frames/

The input directory name is also auto-generated – as .input.inp/. It is also possible to do all processing in a single directory. In this case, output frames clobber input frames, so the process cannot be resumed if interrupted:
>upscale input.vid .input.vid.inp/
>upscale input-frames/ input-frames/

Experimental: An output video file can be specified instead of an output directory. In this case, the script will attempt to generate a new video file with the upscaled frames replaced. Preservation of the original video quality and data is limited by what ffmpeg does by default:
>upscale input.vid output.vid
This feature is experimental – use it at your own risk.

Installation:

Dependencies:
This script runs on Linux, and uses Python, PyTorch, and PyFlow, and optionally ffmpeg for extracting and replacing frames from/to video files. Successful compilation of PyFlow also requires C++ and Python development libraries (they can safely be removed after installation). To ensure that all dependencies exist, use the setup script included in the project.

To download and install the project from GitHub:
>wget -O - https://github.com/arnon-weinberg/Upscale-interpolate-STARnet/archive/master.tar.gz | tar xz
>cd Upscale-interpolate-STARnet-master
>setup
>source pytorch/bin/activate
>upscale --help

The setup script does not require root, and all changes are local, so uninstalling the project is a simple matter of removing its directory.

Details:

I built this script to test out the STARnet algorithm on practical examples. Now you can use it too, on your own videos!

Frame generation is very processing intensive. On my old laptop (no GPU), it can take 5 minutes a frame to process a small 256×138 resolution video (sample image below), and over 1 hour per frame for 720×480. Processing time benefits greatly from the availability of a GPU, but if you don’t want to take advantage of it, then use of the GPU(s) can be turned off with the option --no-cuda.

The STARnet model only offers x4 zoom, and uses information from only 1 neighbouring frame, but nonetheless, thanks to the generation of intermediate frames, performs about as well as, or slightly better than, the RBPN algorithm that it is based on. Some videos are greatly enhanced by super-resolution, while others are unaffected – presumably resolution is not the only problem with video quality, so this is not a universal solution for upgrading any low-quality video.

Bottom line: While this machine learning algorithm may not be ready for prime-time, this is still a fun project to play with, and perhaps you find it useful to enhance your own videos.

Command used: >upscale 'Lilies - S1E1.mp4' --scene=1:43+10

Leave a Reply

Your email address will not be published. Required fields are marked *