streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
-
Updated
Aug 11, 2025 - Python
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory
Use PaliGemma to auto-label data for use in training fine-tuned vision models.
Minimalist implementation of PaliGemma 2 & PaliGemma VLM from scratch
PyTorch implementation of PaliGemma 2
Notes for the Vision Language Model implementation by Umar Jamil
AI-powered tool to convert text from images into your desired language. Gemma vision model and multilingual model are used.
PyTorch implementation of Google’s Paligemma VLM with SigLip image encoder, KV caching, Rotary embeddings and Grouped Query attention . Modular, research-friendly, and easy to extend for experimentation.
Image Captioning with PaliGemma 2 Vision Language Model.
Leverage PaliGemma 2's DOCCI fine-tuned variant capabilities using LitServe.
Leverage PaliGemma 2 mix model variant capabilities using LitServe.
MAESTRO is an AI-powered research application designed to streamline complex research tasks.
Add a description, image, and links to the paligemma topic page so that developers can more easily learn about it.
To associate your repository with the paligemma topic, visit your repo's landing page and select "manage topics."