Summary
In this internship at the London AI Video Lab, the objective is to design computationally efficient video decoders in an AI-based video compression codec. Current AI-based video compression models outperform conventional codecs, like HEVC, VVC and AV1. However, this comes at the cost of impractical compute requirements: at decode, current AI-based video compression decoders are several orders of magnitude more complex than conventional video compression decoders. The goal of the internship is to design efficient AI-based decoders that leverage spatial sparsity to reduce their computational complexity.
This work will be seen as one step forward toward the deployment of end-to-end trained AI-based video compression models.
The goal will be to find and review potential existing methods of spatial sparsity in AI-based video models. In a second step, spatially sparse AI-based decoders will be designed, implemented and integrated into the London AI Video Lab's end-to-end trained video compression model. The performance of the proposed solution will be evaluated and compared to existing models.
The internship will take place in the London AI Video Lab. The intern will be mentored by scientists and will be part of a research project developing end-to-end trained AI-based video compression models.
Duration: 5-6 months, starting January-April 2026
Responsibilities
State-of-the-art and analysis of existing solutions
Implementation of a computationally efficient AI-based video decoder
Evaluation and reporting of results
Related work
Graham, Benjamin, and Laurens Van der Maaten. "Submanifold sparse convolutional networks." arXiv preprint arXiv:17 (2017).
Jia, Zhaoyang, et al. "Towards practical real-time neural video compression." Proceedings of the Computer Vision and Pattern Recognition Conference. 2025.
Parger, Mathias, et al. "Deltacnn: End-to-end cnn inference of sparse frame differences in videos." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.
Keywords: computer vision, video compression, machine learning (deep learning), real-time video processing
Expected Outcomes:
Apart from the expected outcome that corresponds to the spatially sparse video model and its evaluation, this internship will be expected to generate patents and publications.
Qualifications
MSc in Computer Science, Machine Learning, Mathematics, Physics or a related field
Deep learning, computer vision, Python, PyTorch