Upscale video super-resolution using RSTT

Increase video resolution with an opensource machine learning algorithm for upscaling video image frames using an automated command line script.

Bringing machine learning algorithms a step closer to usability.

Given a low-resolution video file, this script uses a machine-learning algorithm to increase (upscale) each frame’s resolution using information from neighbouring frames. The workhorse is the Real-time Spatial Temporal Transformer (RSTT) for Space-Time Video Super-Resolution (STVSR) algorithm from Geng, Liang, Ding, & Zharkov (2022) – all credit goes to the authors.


To upscale a full video file:
>upscale input.vid

To upscale part of a video file, specify time span, a frame range, or a mixture of both. Use “-” to indicate start and (optionally) end points, or “+” to indicate start and (optionally) span. For example, assuming, a 10fps video, the following are all equivalent:
>upscale input.vid --scene=0:01:00-0:01:10
>upscale input.vid --scene=0:01:00+0:10
>upscale input.vid --scene=600-700
>upscale input.vid --scene=600+100
>upscale input.vid --scene=1:00.00+100

If the upscale process is interrupted, then it is automatically resumed by re-running the command. To prevent this behaviour (clobber any existing output frames) use:
>upscale input.vid --no-resume

Frames are extracted from video files using ffmpeg, so any video file format supported by ffmpeg will work. However, if you would like to use a different program (eg, VLC), or prefer to handle the extraction manually, then just place the frames as a sequence of .png files in a directory:
>upscale input-frames/

The output directory name is auto-generated as .input.out/, or can be specified manually:
>upscale input.vid output-frames/
>upscale input-frames/ output-frames/

The input directory name is also auto-generated – as .input.inp/. It is also possible to do all processing in a single directory. In this case, output frames clobber input frames, so the process cannot be resumed if interrupted:
>upscale input.vid .input.vid.inp/
>upscale input-frames/ input-frames/

Experimental: An output video file can be specified instead of an output directory. In this case, the script will attempt to generate a new video file with the upscaled frames replaced. Preservation of the original video quality and data is limited by what ffmpeg does by default:
>upscale input.vid output.vid
This feature is experimental – use it at your own risk.


This script runs on Linux, and uses Python and PyTorch, and optionally ffmpeg for extracting and replacing frames from/to video files. To ensure that all dependencies exist, use the setup script included in the project.

To download and install the project from GitHub:
>wget -O - | tar xz
>cd Upscale-video-RSTT-main
>source pytorch/bin/activate
>upscale --help

The setup script does not require root, and all changes are local, so uninstalling the project is a simple matter of removing its directory.


I built this script to test out the RSTT algorithm on practical examples. Now you can use it too, on your own videos!

RSTT offers 3 model variations with successively more parameters (higher quality output) at the cost of slower performance: RSTT-S (the default), RSTT-M, and RSTT-L. You can try out each model using the option --model.

Frame generation is processing intensive, but all 3 variations of this algorithm are remarkably efficient. On my old laptop (no GPU), it can take as little as 5 seconds a frame to process a small 256×138 resolution video (sample image below), and nearly 1 minute per frame for 720×480. Processing time benefits greatly from the availability of a GPU, but if you don’t want to take advantage of it, then use of the GPU(s) can be turned off with the option --no-cuda.

The RSTT models only offer x4 zoom, and use information from neighbouring frames processed in batches of 7. Some videos are greatly enhanced by super-resolution, while others are unaffected – presumably resolution is not the only problem with video quality, so this is not a universal solution for upgrading any low-quality video.

Bottom line: This machine learning algorithm is very practical to use, a fun project to play with, and perhaps you find it useful to enhance your own videos.

Command used: >upscale 'Lilies - S1E1.mp4' --scene=1:43+10

Leave a Reply

Your email address will not be published. Required fields are marked *