Skip to content

SHI-Labs/OneFormer

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

OneFormer: One Transformer to Rule Universal Image Segmentation

Framework: PyTorch Open In Colab License

PWC PWC PWC PWC PWC PWC PWC PWC PWC

Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi

Equal Contribution

[Project Page] [arXiv] [pdf] [BibTeX]

This repo contains the code for our paper OneFormer: One Transformer to Rule Universal Image Segmentation.

Features

  • OneFormer is the first multi-task universal image segmentation framework based on transformers.
  • OneFormer needs to be trained only once with a single universal architecture, a single model, and on a single dataset , to outperform existing frameworks across semantic, instance, and panoptic segmentation tasks.
  • OneFormer uses a task-conditioned joint training strategy, uniformly sampling different ground truth domains (semantic instance, or panoptic) by deriving all labels from panoptic annotations to train its multi-task model.
  • OneFormer uses a task token to condition the model on the task in focus, making our architecture task-guided for training, and task-dynamic for inference, all with a single model.

OneFormer

Contents

  1. News
  2. Installation Instructions
  3. Dataset Preparation
  4. Execution Instructions
  5. Results
  6. Citation

News

November 10, 2022

  • Project Page, ArXiv Preprint and GitHub Repo are public!
  • OneFormer sets new SOTA on Cityscapes val with single-scale inference on Panoptic Segmentation with 68.5 PQ score and Instance Segmentation with 46.7 AP score!
  • OneFormer sets new SOTA on ADE20K val on Panoptic Segmentation with 50.2 PQ score and on Instance Segmentation with 37.6 AP!
  • OneFormer sets new SOTA on COCO val on Panoptic Segmentation with 58.0 PQ score!

Installation Instructions

  • We use Python 3.8, PyTorch 1.10.1 (CUDA 11.3 build).
  • We use Detectron2-v0.6.
  • For complete installation instructions, please see INSTALL.md.

Dataset Preparation

  • We experiment on three major benchmark dataset: ADE20K, Cityscapes and COCO 2017.
  • Please see Preparing Datasets for OneFormer for complete instructions for preparing the datasets.

Execution Instructions

Training

  • We train all our models using 8 A6000 (48 GB each) GPUs.
  • We use 8 A100 (80 GB each) for training Swin-L OneFormer and DiNAT-L OneFormer on COCO and all models with ConvNeXt-XL backbone. We also train the 896x896 models on ADE20K on 8 A100 GPUs.
  • Please see Getting Started with OneFormer for training commands.

Evaluation

Demo

  • We provide a quick to run demo on Colab Open In Colab.
  • Please see OneFormer Demo for command line instructions on running the demo.

Results

Results

  • † denotes the backbones were pretrained on ImageNet-22k.
  • Pre-trained models can be downloaded following the instructions given under tools.

ADE20K

Method Backbone Crop Size PQ AP mIoU
(s.s)
mIoU
(ms+flip)
#params config Checkpoint
OneFormer Swin-L 640×640 48.6 35.9 57.0 57.7 219M config model
OneFormer Swin-L 896×896 50.2 37.6 57.4 58.3 219M config model
OneFormer ConvNeXt-L 640×640 48.7 36.2 56.6 57.4 220M config model
OneFormer DiNAT-L 640×640 49.1 36.0 57.8 58.4 223M config model
OneFormer DiNAT-L 896×896 50.0 36.8 58.1 58.6 223M config model
OneFormer ConvNeXt-XL 640×640 48.9 36.3 57.4 58.8 372M config model

Cityscapes

Method Backbone PQ AP mIoU
(s.s)
mIoU
(ms+flip)
#params config Checkpoint
OneFormer Swin-L 67.2 45.6 83.0 84.4 219M config model
OneFormer ConvNeXt-L 68.5 46.5 83.0 84.0 220M config model
OneFormer DiNAT-L 67.6 45.6 83.1 84.0 223M config model
OneFormer ConvNeXt-XL 68.4 46.7 83.6 84.6 372M config model

COCO

Method Backbone PQ PQTh PQSt AP mIoU #params config Checkpoint
OneFormer Swin-L 57.9 64.4 48.0 49.0 67.4 219M config model
OneFormer DiNAT-L 58.0 64.3 48.4 49.2 68.1 223M config model

Citation

If you found OneFormer useful in your research, please consider starring us on GitHub and citing 📚 us in your research!

@article{jain2022oneformer,
      title={OneFormer: One Transformer to Rule Universal Image Segmentation},
      author={Jitesh Jain and Jiachen Li and MangTik Chiu and Ali Hassani and Nikita Orlov and Humphrey Shi},
      journal={arXiv}, 
      year={2022}
    }

Acknowledgement

We thank the authors of Mask2Former, GroupViT, and Neighborhood Attention Transformer for releasing their helpful codebases.