multi-modal-llms

Star

Here are 6 public repositories matching this topic...

HKUDS / Vimo

Star

"Vimo: Chat with Your Videos"

rag large-language-models llms long-video-understanding retrieval-augmented-generation multi-modal-llms

Updated Aug 1, 2025
Python

skit-ai / SpeechLLM

Star

This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.

speech conversational-ai multi-modality llm multi-modal-llms

Updated Jun 25, 2024
Python

ParthaPRay / LLM-Learning-Sources

Star

This repo contains a list of channels and sources from where LLMs should be learned

lora fine-tuning huggingface instruction-following prompt-engineering generative-ai instruction-tuning large-language-model retrieval-augmented-generation multi-modal-llms pervasive-generative-ai iot-generative-ai

Updated Aug 29, 2024

duyhominhnguyen / Exgra-Med

Star

ExGra-Med: Medical Multi-Modal LLM with Extended Context Alignment

graph-algorithms healthcare-application medical-informatics zero-shot-learning visual-question-answering foundation-models large-language-models vision-language-model multi-modal-llms

Updated Jun 27, 2025
Python

DragonLiu1995 / video-to-audio-through-text

Star

[NeurIPS 2024] Code, Dataset, Samples for the VATT paper “ Tell What You Hear From What You See - Video to Audio Generation Through Text”

generative-models neurips video-to-audio audio-generation generative-ai multi-modal-llms neurips-2024 neurips-2024-presentation sight-to-sound visual-to-audio

Updated Jul 24, 2025
Python

ictnlp / FastLongSpeech

Star

FastLongSpeech is a novel framework designed to extend the capabilities of Large Speech-Language Models for efficient long-speech processing without necessitating dedicated long-speech training data.

speech speech-recognition speech-to-text multi-modal speech-processing spoken-language-understanding speech-emotion-recognition large-language-models llms llm-training qwen speech-llms large-speech-models multi-modal-llms qwen2-5 spoken-dialogue-models

Updated Jul 22, 2025
Python

Improve this page

Add a description, image, and links to the multi-modal-llms topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multi-modal-llms topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-modal-llms

Here are 6 public repositories matching this topic...

HKUDS / Vimo

skit-ai / SpeechLLM

ParthaPRay / LLM-Learning-Sources

duyhominhnguyen / Exgra-Med

DragonLiu1995 / video-to-audio-through-text

ictnlp / FastLongSpeech

Improve this page

Add this topic to your repo