Skip to content
@OpenGVLab

OpenGVLab

General Vision Team of Shanghai AI Laboratory

Static Badge Twitter

Welcome to OpenGVLab! 👋

We are a research group from Shanghai AI Lab focused on Vision-Centric AI research. The GV in our name, OpenGVLab, means general vision, a general understanding of vision, so little effort is needed to adapt to new vision-based tasks.

We develop model architecture and release pre-trained foundation models to the community to motivate further research in this area. We have made promising progress in general vision AI, with 109 SOTA🚀. In 2022, our open-sourced foundation model 65.5 mAP on the COCO object detection benchmark, 91.1% Top1 accuracy in Kinetics 400, achieved landmarks for AI vision👀 tasks for image🖼️ and video📹 understanding.

Based on solid vision foundations, we have expanded to Multi-Modality models and Generative AI(partner with Vchitect). We aim to empower individuals and businesses by offering a higher starting point for developing vision-based AI products and lessening the burden of building an AI model from scratch.

Branches: Alpha (explore lattest advances in vision+language research) and uni-medical (focus on medical AI)

Follow us: ?? Twitter X logo Twitter ??🤗Hugging Face ?? Medium logo Medium ?? WeChat logo WeChat ?? zhihu logo Zhihu

Pinned

  1. InternVL InternVL Public

    [CVPR 2024 Oral] InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks —— An Open-Source Alternative to ViT-22B

    Jupyter Notebook 632 20

  2. InternVideo InternVideo Public

    Video Foundation Models & Data for Multimodal Understanding

    Python 880 57

  3. VideoMamba VideoMamba Public

    VideoMamba: State Space Model for Efficient Video Understanding

    Python 528 37

  4. LLaMA-Adapter LLaMA-Adapter Public

    [ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

    Python 5.5k 357

  5. SAM-Med2D SAM-Med2D Public

    Official implementation of SAM-Med2D

    Jupyter Notebook 724 64

  6. OmniQuant OmniQuant Public

    [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

    Python 541 42

Repositories

Showing 10 of 56 repositories
  • InternVL Public

    [CVPR 2024 Oral] InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks —— An Open-Source Alternative to ViT-22B

    Jupyter Notebook 632 MIT 20 21 0 Updated Apr 11, 2024
  • VideoMamba Public

    VideoMamba: State Space Model for Efficient Video Understanding

    Python 528 Apache-2.0 37 5 1 Updated Apr 11, 2024
  • DiffAgent Public

    [CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

    7 MIT 0 0 0 Updated Apr 10, 2024
  • PonderV2 Public

    PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

    Python 289 MIT 5 6 0 Updated Apr 9, 2024
  • Instruct2Act Public

    Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

    Python 252 16 2 0 Updated Apr 9, 2024
  • InternVideo Public

    Video Foundation Models & Data for Multimodal Understanding

    Python 880 Apache-2.0 57 30 3 Updated Apr 7, 2024
  • Ask-Anything Public

    [CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

    Python 2,623 MIT 212 62 3 Updated Apr 5, 2024
  • MM-Interleaved Public

    MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

    Python 150 Apache-2.0 8 4 0 Updated Apr 3, 2024
  • UniFormerV2 Public

    [ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer

    Python 265 Apache-2.0 16 9 1 Updated Apr 2, 2024
  • Hulk Public

    An official implementation of "Hulk: A Universal Knowledge Translator for Human-Centric Tasks"

    Python 37 MIT 1 0 0 Updated Apr 2, 2024