#

vision-language-model

Here are 48 public repositories matching this topic...

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning: LLaVA (Large Language-and-Vision Assistant) built towards GPT-4V level capabilities.

chatbot llama multimodal multi-modality gpt-4 foundation-models visual-language-learning chatgpt instruction-tuning vision-language-model llava llama2 llama-2

Updated Nov 30, 2023
Python

QwenLM / Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

large-language-models vision-language-model

Updated Nov 15, 2023
Python

jingyi0000 / VLM_survey

Vision-Language Models for Vision Tasks: A Survey

computer-vision deep-learning survey transfer-learning clip knowledge-distillation vision-language-model

Updated Nov 22, 2023

NVlabs / prismer

The implementation of "Prismer: A Vision-Language Model with An Ensemble of Experts".

vqa image-captioning language-model multi-task-learning vision-and-language multi-modal-learning vision-language-model

Updated Apr 29, 2023
Python

InternLM / InternLM-XComposer

foundation multimodal lmm multi-modality gpt-4 visual-language-learning chatgpt instruction-tuning mllm vision-language-model internlm gpt4v

Updated Nov 29, 2023
Python

roboflow / multimodal-maestro

Effective prompting for Large Multimodal Models like GPT-4 Vision or LLaVA. 🔥

object-detection cross-modal multimodality instance-segmentation lmm gpt-4 visual-prompting prompt-engineering vision-language-model llava segment-anything gpt-4-vision

Updated Dec 1, 2023
Python

AlaaLab / InstructCV

Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"

generative-model text-to-image multi-task-learning diffusion-models stable-diffusion vision-language-model

Updated Nov 21, 2023
Python

AlibabaResearch / AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.

ocr computer-vision artificial-intelligence text-recognition document text-detection document-analysis end-to-end-ocr multimodal scene-text-recognition multimodal-deep-learning scene-text-detection vision-language document-understanding scene-text-detection-recognition document-recognition document-intelligence documentai vision-language-transformer vision-language-model

Updated Nov 24, 2023
C++

mbzuai-oryx / groundingLMM

Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

vision-and-language lmm foundation-models vision-language-model llm-agent

Updated Nov 30, 2023

OpenGVLab / Multi-Modality-Arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

chat chatbot vqa gradio multi-modality large-language-models llms chatgpt vision-language-model

Updated Nov 11, 2023
Python

PKU-YuanGroup / Chat-UniVi

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

video-understanding image-understanding large-language-models vision-language-model

Updated Nov 29, 2023
Python

VPGTrans / VPGTrans

Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.

llm vision-language-model large-scale-language-modeling vl-llm

Updated Oct 13, 2023
Python

atfortes / Awesome-Multimodal-Reasoning

Collection of papers and resources on Multimodal Reasoning, including Vision-Language Models, Multimodal Chain-of-Thought, Visual Inference, and others.

Updated Nov 3, 2023

huangwl18 / VoxPoser

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

robotics motion-planning robotic-manipulation embodied-ai foundation-models large-language-models vision-language-model

Updated Nov 9, 2023
Python

FeiElysia / ViECap

Transferable Decoding with Visual Entities for Zero-Shot Image Captioning, ICCV 2023

transferability modality-biases vision-language-model zero-shot-captioning object-hallucination

Updated Sep 28, 2023
Python

PJLab-ADG / awesome-knowledge-driven-AD

A curated list of awesome knowledge-driven autonomous driving (continually updated)

autonomous-driving knowledge-driven large-language-models vision-language-model

Updated Nov 30, 2023

zwx8981 / LIQE

[CVPR2023] Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective

clip image-quality-assessment blind-image-quality-assessment multitask-learning no-reference-image-quality-assessment vision-language-model

Updated Oct 2, 2023
Python

sun-hailong / LAMDA-PILOT

🎉 PILOT: A Pre-trained Model-Based Continual Learning Toolbox

machine-learning deep-learning toolkit reproducible-research pytorch incremental-learning lifelong-learning continual-learning pre-trained-models vision-transformer vision-language-model

Updated Oct 23, 2023
Python

Surrey-UPLab / Recognize-Any-Regions

Recognize Any Regions

open-world object-detection zero-shot instance-segmentation auto-labeling vision-language-pretraining open-vocabulary vision-language-model multimodal-representation-learning vision-foundation-model vision-language-foundation-model

Updated Nov 22, 2023
Python

yunqing-me / AttackVLM

Code of the paper: On Evaluating Adversarial Robustness of Large Vision-Language Models

deep-generative-model adversarial-attack trustworthy-ai foundation-models large-language-models text-to-image-generation generative-ai vision-language-model image-to-text-generation

Updated Oct 30, 2023
Python

Improve this page

Add a description, image, and links to the vision-language-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language-model topic, visit your repo's landing page and select "manage topics."