🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
-
Updated
Nov 16, 2023 - Python
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
[NeurIPS'23 Oral] Visual Instruction Tuning: LLaVA (Large Language-and-Vision Assistant) built towards GPT-4V level capabilities.
✨✨Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
Algorithms and Publications on 3D Object Tracking
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
Collaborative Diffusion (CVPR 2023)
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
Build, Deploy, and Scale Reliable Swarms of Autonomous Agents. Join our Community: https://discord.gg/DbjBMJTSWD
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
[ICCV2019] Robust Multi-Modality Multi-Object Tracking
An official PyTorch implementation of the CRIS paper
Unifying Voxel-based Representation with Transformer for 3D Object Detection (NeurIPS 2022)
This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
An open-source cloud-native of large multi-modal models (LMMs) serving framework.
An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast
(NeurIPS 2022 CellSeg Challenge - 1st Winner) Open source code for "MEDIAR: Harmony of Data-Centric and Model-Centric for Multi-Modality Microscopy"
Add a description, image, and links to the multi-modality topic page so that developers can more easily learn about it.
To associate your repository with the multi-modality topic, visit your repo's landing page and select "manage topics."