Anno.Ai’s Favorite Papers from CVPR 2021

CVPR21 Favorites

4 min readAug 3, 2021

IEEE’s CVPR conference is one of the most popular among the computer vision community. 1600 papers were accepted to CVPR21, showcasing a number of impressive advances over the past year. We highlight eight exceptional papers that should be read, with code that should be run, when your busy schedule permits it.

Chris’ Picks

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields

Paper / Code
Niemeyer and Geiger use deep generative models to not only generate but also control realistic scenes at the object level. In contrast to previous approaches, GIRAFFE uses a 3D GAN, which helps to disentangle relationships between distinct objects (learned from image data) in the latent space, making it easier to modify a scene at the object level.

Open WoRld Object DEtector (ORE)

Paper / Code
Object detection is usually treated as a closed-set problem i.e. a model is trained on a finite set of objects and no new objects are introduced at testing. Joseph et al. treat object detection as an open-set problem, which is a more realistic formulation in the real world. Using Faster RCNN as the base model, the authors develop ORE, which employs contrastive clustering, a region-proposal network, and other elements to incrementally learn new classes without forgetting older observed classes.

HOTR: End-to-End Human-Object Interaction Detection with Transformers

Paper / Code
The goal of human-object interaction (HOI) is to learn the relationship between people and objects in images. HOI is a challenging problem, as it couples the tasks of object detection and interaction inference. Using the object DEtection TRansformer (DETR) as a baseline, Kim et al. develop a network model that jointly learns <human, object, interaction> triplets. This approach demonstrates improved AP and a 5x speedup against previous state-of-the-art on the V-COCO dataset.

Exploring Simple Siamese Representation Learning

Paper / Code
Chen and He explore the application of siamese networks to unsupervised visual representation learning and find evidence that contradicts previous beliefs around their limitations of use, e.g. siamese networks were thought to need negative sample pairs. In their SimSiam model, the authors use a stop-gradient operation to prevent convergence to a trivial solution, a commonly cited problem for siamese networks.

Hanna’s Picks

Deep Convolutional Dictionary Learning for Image Denoising

Paper / Code
This paper proposes a new framework of deep convolutional dictionary learning (DCDicL). DCDicL demonstrates leading denoising performance in terms of both quantitative metrics (e.g., PSNR, SSIM) and visual quality. In particular, it can reproduce the subtle image structures and textures, which are hard to recover by many existing denoising DNNs.

Iterative Filter Adaptive Network for Single Image Defocus Deblurring

Paper / Code
The paper discusses an end-to-end network embedded with their Iterative Filter Adaptive Network (IFAN) for single image defocus deblurring. IFAN is specifically designed for the effective handling of spatially varying and large defocus blur, something that many other methods fail to achieve.

ReDet: A Rotation-Equivariant Detector for Aerial Object Detection

Paper / Code
Many models have problems when trying to detect objects with different orientations without training on large amounts of data. This paper addresses this problem by creating a model that encodes rotation equivariance and rotation invariance.

Improving Multiple Object Tracking With Single Object Tracking

Paper
A novel method proposed in this paper combines multiple object tracking (MOT) and single object tracking (SOT) architecture. This means that an MOT task is able to benefit from the strong discriminative power of SOT methods in an effective and efficient way.

Conclusions

A number of computer vision advances were showcased at CVPR21 this year. We highlighted 2x4 papers addressing a number of active research areas, including object detection, denoising, deblurring, and scene representation.

About the Authors

Hanna Diamond is a Data Scientist at Anno.Ai, where she focuses on computer vision applications and development of deep learning models. Hanna has a background in image processing, computer vision, and robotics; she holds a BA in Computer Science from Oberlin College. Hanna leads Anno’s Zwift virtual bike team and enjoys building up gaming workstations in her free time.

Christopher Farah is a Senior Data Scientist at Anno.Ai. Chris has over 14 years of experience conducting spatial data mining research in the healthcare and national security sectors. Chris has a Bachelors in Chemical Engineering from The Cooper Union, a MA in Mathematics from St. Louis University, and a PhD in Spatial Information Science and Engineering from the University of Maine.

Be sure to follow us on Twitter and LinkedIn!