**Notes from ICCV 2023 Conference in Paris: My Top Picks**
1. **Towards understanding the connection between generative and discriminative learning**
2. **Pre-pretraining: Combining visual self-supervised training with natural language supervision**
3. **Adapting a pre-trained model by refocusing its attention**
4. **Image and video segmentation using discrete diffusion generative models**
5. **Diffusion models for stochastic segmentation**
6. **Diffusion models: replacing the commonly-used U-Net with transformers**
7. **Diffusion Models as (Soft) Masked Autoencoders**
8. **Denoising Diffusion Autoencoders as Self-supervised Learners**
9. **Leveraging DINO attention masks to the maximum**
10. **Generative learning on images: can’t we do better than FID?**
**Other Key Picks:**
– **Sigmoid Loss for Language Image Pre-Training**
– **Distilling Large Vision-Language Model with Out-of-Distribution Generalizability**
– **Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit?**
– **Unified Visual Relationship Detection with Vision and Language Models**
– **An Empirical Investigation of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration**
– **Discovering prototypes for dataset comparison**
– **Understanding the Feature Norm for Out-of-Distribution Detection**
– **Benchmarking Low-Shot Robustness to Natural Distribution Shifts**
– **Distilling from Similar Tasks for Transfer Learning on a Budget**
– **Leveraging Visual Attention for out-of-distribution Detection**
**Concluding Thoughts on ICCV 2023 Conference**
If you want to delve into the specifics of these top picks and more, these notes from the conference provide a comprehensive overview of the cutting-edge research and innovations in the field of computer vision and deep learning. Don’t forget to share the exceptional insights you’ve gained from these notes on social media!
I was lucky and privileged enough to attend the ICCV 2023 conference in Paris. After collecting papers and notes I decided to share my notes along with my favourite ones. Here are the best papers picked out along with their key ideas. If you like my notes below, share them on social media!
### Towards understanding the connection between generative and discriminative learning
Key idea: A very new trend that I am extremely excited about is the connection between generative and discriminative modeling. Is there any shared representation between them?
The authors demonstrate the existence of matching neurons (rosetta neurons) across different models that express a shared concept (such as object contours, object parts, and colors). These concepts emerge without any supervision or manual annotations.
### Pre-pretraining: Combining visual self-supervised training with natural language supervision
Motivation: The masked autoencoder (MAE) randomly masks 75% of an image and trains the model to reconstruct the masked input image by minimizing the pixel reconstruction error. MAE has only been shown to scale with model size on ImageNet.
Key idea: While MAE thrives in dense vision tasks like segmentation, weakly supervised learning (WSL) learns abstract features and has a remarkable zero-shot performance. Can we find a way to get the best of both worlds?
### Adapting a pre-trained model by refocusing its attention
Since foundational models are the way to go, finding clever ways to adapt them to various downstream tasks is a critical research avenue.
Key idea: Given a pretrained ViT backbone, they tune the additional linear layers of their method that act as feedback paths after the 1st forward pass. As such the model can redirect its attention to the task-relevant features and achieve better performance.
### Image and video segmentation using discrete diffusion generative models
Google DeepMind presented an intriguing work called “A Generalist Framework for Panoptic Segmentation of Images and Videos”.
Key idea: A diffusion model is proposed to model panoptic segmentation masks, with a simple architecture and generic loss function. Specifically for segmentation, we want the class and the instance ID, which are discrete targets. For this reason, the infamous Bit Diffusion was used.
### Diffusion models for stochastic segmentation
In a proximal work, researchers from the University of Bern showed that categorical diffusion models can be used for stochastic image segmentation in their work titled “Stochastic Segmentation with Conditional Categorical Diffusion Models”.
These are just a few of the many exciting papers presented at the ICCV 2023 conference. The field of computer vision is rapidly evolving, and it’s amazing to see the innovative work being done by researchers around the world. If you’re interested in learning more about these topics, be sure to check out the papers mentioned above and stay tuned for more groundbreaking research in the future.
Overall, my experience at ICCV 2023 was incredibly enlightening, and I can’t wait to see how the field of computer vision continues to grow and evolve in the years to come.
If you’re interested in deep learning and production, consider checking out the book “Deep Learning in Production” to learn more about building, training, deploying, scaling, and maintaining deep learning models.
Thank you for reading!