[NeurIPS'23 Oral] Visual Instruction Tuning: LLaVA (Large Language-and-Vision Assistant) built towards GPT-4V level capabilities.
-
Updated
Dec 2, 2023 - Python
[NeurIPS'23 Oral] Visual Instruction Tuning: LLaVA (Large Language-and-Vision Assistant) built towards GPT-4V level capabilities.
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
Docker image for LLaVA: Large Language and Vision Assistant
Add a description, image, and links to the visual-language-learning topic page so that developers can more easily learn about it.
To associate your repository with the visual-language-learning topic, visit your repo's landing page and select "manage topics."