Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
-
Updated
Feb 8, 2024 - Python
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
Collection of papers and resources on Reasoning in Language Models (LLMs), including Chain-of-Thought, Instruction-Tuning, Multimodality.
The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
Open Source + Multilingual MLLM + Fine-tuning + Distillation + More efficient models and learning + ?
A family of lightweight multimodal models.
🖼️Latest Papers on Visually(Imagination)-Augmented NLP
🤖A list of PaperList of NLP related papers on Github
Add a description, image, and links to the mllm topic page so that developers can more easily learn about it.
To associate your repository with the mllm topic, visit your repo's landing page and select "manage topics."