multi-modal

A paper list about large language models and multimodal models (Diffusion, VLM). From foundations to applications. It is only used to record papers for my personal needs.

generative-model multi-modal paper-list diffusion-models foundation-models large-language-models stable-diffusion

Updated Jun 3, 2024

modelscope / modelscope

Star

ModelScope: bring the notion of Model-as-a-Service to life.

python nlp science machine-learning deep-learning cv speech multi-modal

Updated Jun 3, 2024
Python

modelscope / agentscope

Star

Start building LLM-empowered multi-agent applications in an easier way.

agent chatbot multi-agent multi-modal distributed-agents gpt-4 large-language-models llm llm-agent llama3 gpt-4o

Updated Jun 3, 2024
Python

SciSharp / LLamaSharp

Star

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

chatbot llama gpt multi-modal llm llava semantic-kernel llamacpp llama-cpp llama2 llama3

Updated Jun 2, 2024
C#

Lizhecheng02 / MultiModal

Star

Basic implementation code for multimodal models and some applications or fine-tuning tasks based on them.

transformer multi-modal finetune rag

Updated Jun 1, 2024
Jupyter Notebook

OpenGVLab / InternVL

Star

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

image-classification gpt multi-modal semantic-segmentation video-classification mme image-text-retrieval llm vision-language-model gpt-4v vit-6b vit-22b gpt-4o

Updated Jun 1, 2024
Python

THUDM / CogVLM2

Star

GPT4V-level open-source multi-modal model based on Llama3-8B

pretrained-models language-model multi-modal cogvlm

Updated Jun 1, 2024
Python

valhalla / valhalla

Star

Open Source Routing Engine for OpenStreetMap

directions openstreetmap routing astar traveling-salesman dijkstra routing-engine isochrones multi-modal tiled

Updated May 31, 2024
C++

Yuan-ManX / ai-multimodal-timeline

Star

Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Audio, Image, Video, Music and 3D content. 🔥

ai multi-modal deeplearning-ai multimodal multimodal-deep-learning llm

Updated May 31, 2024

modelscope / data-juicer

Star

A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据！

Updated May 31, 2024
Python

open-compass / VLMEvalKit

Star

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks

computer-vision evaluation pytorch gemini openai vqa vit gpt multi-modal clip claude openai-api gpt4 large-language-models llm chatgpt llava qwen gpt-4v

Updated May 31, 2024
Python

wangxiao5791509 / MultiModal_BigModels_Survey

Star

[MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models

audio review radar natural-language transformers point-cloud survey depth multi-modal thermal-infrared self-attention pre-training event-camera pengchenglab big-models anhui-university rgb-text-audio

Updated May 31, 2024

yihedeng9 / STIC

Star

Enhancing Large Vision Language Models with Self-Training on Image Comprehension.

multi-modal multi-modal-learning vision-language-model llava llm-finetuning mistral-7b

Updated May 31, 2024
Python

Kav-K / GPTDiscord

Sponsor

Star

A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!