[model] Support Kimi_VL thinking/instruct #7719

Kuangdd01 · 2025-04-14T10:35:41Z

What does this PR do?

add kimi_vl
Fixes #7680 (issue)

Have tested

Single GPU LoraFinetuning ~50GB VRAM (✔️)
Full Finetuning with zero2 @chenllliang (✔️)
Multiple GPUs inference in webui w/o thinking (✔️)
Multiple GPUs Lora Finetuning with zero2 (✔️)
Failed in Multiple GPUs Lora Finetuning without zero2 (❌) not recommended

Exp configs

Lora

### model
model_name_or_path: moonshotai/Kimi-VL-A3B-Instruct
trust_remote_code: true

### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj # ref deepseek ft
freeze_vision_tower: true
freeze_multi_modal_projector: true
freeze_language_model: false
flash_attn: auto

### dataset
dataset: identity, mllm_demo
template: kimi_vl
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 1
dataloader_num_workers: 4

### output
output_dir: saves/kimi-vl/lora/
logging_steps: 1
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]

### train
per_device_train_batch_size: 4
gradient_accumulation_steps: 2
learning_rate: 1.0e-4
num_train_epochs: 10.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null

# for ddp
# ddp_find_unused_parameters: true

full

### model
model_name_or_path: moonshotai/Kimi-VL-A3B-Instruct
trust_remote_code: true

### method
stage: sft
do_train: true
finetuning_type: full
freeze_vision_tower: true
freeze_multi_modal_projector: true
freeze_language_model: false
deepspeed: examples/deepspeed/ds_z2_config.json # 8xh20 gpu

### dataset
dataset: identity, mllm_demo, alpaca_en_demo
template: kimi_vl
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 1
dataloader_num_workers: 4

### output
output_dir: saves/kimi-vl/full
logging_steps: 1
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]

### train
per_device_train_batch_size: 2
gradient_accumulation_steps: 2
learning_rate: 1.0e-4
num_train_epochs: 10.0
lr_scheduler_type: cosine
warmup_ratio: 0
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null

Support

Thanks to Kimi_VL's team's help. @chenllliang @zhouzaida

TODO

update modeling_kimi_vl.py for training @zhouzaida

Before submitting

Did you read the contributor guideline?
Did you write any new necessary tests?

chenllliang · 2025-04-14T11:46:26Z

script for full SFT with zero2

### model
model_name_or_path: moonshotai/Kimi-VL-A3B-Instruct
trust_remote_code: true

### method
stage: sft
do_train: true
finetuning_type: full
freeze_vision_tower: true
freeze_multi_modal_projector: true
freeze_language_model: false
deepspeed: examples/deepspeed/ds_z2_config.json # 8xh20 gpu

### dataset
dataset: identity, mllm_demo, alpaca_en_demo
template: kimi_vl
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 1
dataloader_num_workers: 4

### output
output_dir: saves/kimi-vl/full
logging_steps: 1
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]

### train
per_device_train_batch_size: 2
gradient_accumulation_steps: 2
learning_rate: 1.0e-4
num_train_epochs: 10.0
lr_scheduler_type: cosine
warmup_ratio: 0
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null

hiyouga

LGTM

Himanshunitrr · 2025-04-24T20:02:26Z

@chenllliang will the script for full SFT with zero2 work for Thinking model too?

Gaoyg · 2025-04-29T02:09:40Z

Have a quesiton: why change the topk_method to greedy during training?

Kuangdd01 · 2025-04-29T02:40:17Z

Have a quesiton: why change the topk_method to greedy during training?

We can not use the origin top_k method in MOEGate, because it only supports the inference stage.
ref: https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/modeling_deepseek.py#L437-L461
So we refer to DeepSeekVL2 and add a greedy method to the modeling and configurations.
ref: https://github.com/deepseek-ai/DeepSeek-VL2/blob/ef9f91e2b6426536b83294c11742c27be66361b1/deepseek_vl2/models/modeling_deepseek.py#L441-L466

Kuangdd01 · 2025-04-29T02:47:30Z

Himanshunitrr

Sure, but the training data for the thinking version should be well-organized with thinking format.

Gaoyg · 2025-04-29T03:02:33Z

Have a quesiton: why change the topk_method to greedy during training?

We can not use the origin top_k method in MOEGate, because it only supports the inference stage. ref: https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/modeling_deepseek.py#L437-L461 So we refer to DeepSeekVL2 and add a greedy method to the modeling and configurations. ref: https://github.com/deepseek-ai/DeepSeek-VL2/blob/ef9f91e2b6426536b83294c11742c27be66361b1/deepseek_vl2/models/modeling_deepseek.py#L441-L466

Get, thanks

[assets] update wechat (hiyouga#7288) [dataset] fix ultrachat_200k dataset (hiyouga#7259) The `HuggingFaceH4/ultrachat_200k` dataset doesn't contain the default "train" split. The correct split is "train_sft". [data] gemma3 plugin pan and scan (hiyouga#7294) * gemma3 pan and scan * add test case * fix test [inference] support sglang backend (hiyouga#7278) * Mimic SGLang offline Engine * Add more tests and args * Pass all current tests * Clean Code * fix sample_params * clean code * Fix Stream Chat * change sglang from engine mode to server mode * fix * Fix Review Issues * Use SGLang Built-In Utilities * Fix test SGLang * Some Doc Issue * fix sglang engine * add readme --------- Co-authored-by: Jin Pan <[email protected]> Co-authored-by: hiyouga <[email protected]> [model] support hunyuan 7b (hiyouga#7317) * [Model]supported tencent-hunyuan model * [Model]supported tencent-hunyuan model(fix) * [Model]supported tencent-hunyuan model(fix) [assets] update videos (hiyouga#7340) * Update README.md * Update README_zh.md [data] fix template (hiyouga#7349) [misc] set dev version (hiyouga#7351) [assets] update wechat (hiyouga#7361) [version] fix minicpmo (hiyouga#7378) [3rdparty] fix redundant process group destroy for ray (hiyouga#7395) * fix redundant process group destroy for ray * Update tuner.py --------- Co-authored-by: hoshi-hiyouga <[email protected]> [misc] fix sglang deps (hiyouga#7432) * feat: Add transformer version requirement for sglang * feat: add srt to sglang which is required for running sglang Other options are srt_hip, srt_xpu, srt_npu, srt_hpu, srt_cpu, for different computation architectures. [deps] upgrade vllm to 0.8 (hiyouga#7436) [deps] upgrade transformers to 4.50.0 (hiyouga#7437) * upgrade transformers * fix hf cache * fix dpo trainer [scripts] support compute score on vllm's predictions (hiyouga#7419) * enable manual bleu&rouge eval by adding `scripts/eval_bleu_rouge.py` * added libraries check * update: 使用datasets库的多进程加速处理 * update: - 使用 fire.Fire - 修改代码格式 * Update eval_bleu_rouge.py: correctly uses fire Deleted the code of using sys.argv * Update eval_bleu_rouge.py --------- Co-authored-by: SnowFox4004 <manba@out> Co-authored-by: hoshi-hiyouga <[email protected]> [misc] fix license (hiyouga#7440) [misc] fix ci (hiyouga#7441) * fix ci * improve ci [docker] upgrade to torch 2.6 (hiyouga#7442) [trainer] fix vlm loss for transformers 4.49 (hiyouga#7448) [assets] fix gemma3 readme (hiyouga#7449) [assets] update wechat (hiyouga#7455) [misc] enable liger kernel for gemma3 (hiyouga#7462) [misc] enable liger kernel for gemma3 text and paligemma (hiyouga#7466) * add gemma3 text * add paligemma (1,2 and 2 mix) [misc] update liger-kernel's monkey patch (hiyouga#7453) * Update liger_kernel.py * Update setup.py [model] fix lora on quant models (hiyouga#7456) Co-authored-by: root <root@ai> [model] add qwen2vl 32b & upgrade peft (hiyouga#7469) * add qwen2vl 32b * fix ci * upgrade peft to 0.15 * fix ci * fix ci [trainer] fix wsd scheduler (hiyouga#7304) * [trainer] Warmup_stable_decay supports setting the number of stable and decay steps according to the warmup_ratio ratio * Update trainer_utils.py --------- Co-authored-by: hoshi-hiyouga <[email protected]> [3rdparty] support swanlab lark notification (hiyouga#7481) [data] fix pixtral plugin (hiyouga#7505) * preserve `image_sizes` * add comments [assets] update wechat (hiyouga#7523) [deps] pin pydantic to 2.10.6 (hiyouga#7546) [model] add Qwen2.5-Omni model (hiyouga#7537) * preserve image_sizes * preserve image_sizes * init plugin * support audio-text2text lora * nit * support image/video-text2text, audio-text2text * remove args * remove lines * add docs && nit * remove some comments * fix && add merge part script * add license [data] fix qwen2.5 omni collator (hiyouga#7553) [trainer] new kto mismatch pair creation strategy (hiyouga#7509) [data] shard the dataset to allow multiprocessing when streaming is enabled (hiyouga#7530) * Shard the dataset when streaming to allow multiprocessing * Allow user to not set dataset_shards to ensure backward compatibility [webui] fix launch with proxy (hiyouga#7332) [data] specify position_ids in PackedSupervisedDatasetProcessor for neat_packing (hiyouga#7318) * use position_ids for neat_packing with fa2 * revert fa2 changes [model] fix use_cache patching for gemma3 multimodal (hiyouga#7500) [model] fix kv cache (hiyouga#7564) [infer] vllm video/audio inference (hiyouga#7566) [trainer] fix batch processing in PPO trainer (hiyouga#7576) [data] fix qwen2.5 omni plugin (hiyouga#7573) * align key with qwen2vl * nit && change scripts [data] fix qwen2.5 omni plugin (hiyouga#7578) * specific entry * Update mm_plugin.py * fix fps cal --------- Co-authored-by: hoshi-hiyouga <[email protected]> [assets] update wechat (hiyouga#7594) [model] add llama4 (hiyouga#7611) [assets] update readme (hiyouga#7612) [misc] fix packing and eval plot (hiyouga#7623) [sglang] support transformers 4.51.0 (hiyouga#7639) [trainer] fix key error (hiyouga#7635) [data] Fix bugs of `use_audio_in_video` in Qwen2.5 Omni (hiyouga#7638) * cache _mm_inputs * nit * support for use_audio_in_video * remove cache * fix data * Update mllm_video_audio_demo.json [assets] update readme (hiyouga#7644) [assets] update readme (hiyouga#7654) [data] add coig-p dataset (hiyouga#7657) [misc] fix cuda warn on intel GPU (hiyouga#7655) [bugfix] enable_gemma_liger_kernel (hiyouga#7660) - The `enable_liger_kernel` function for the Gemma model series was not executed due to the existing `if` statement in the code. - Changed the line to an `elif` statement so that the `apply_liger_kernel` function is executed properly. resolved: hiyouga#7628 [ray] allow for specifying ray.init kwargs (i.e. runtime_env) (hiyouga#7647) * ray init kwargs * Update trainer_utils.py * fix ray args --------- Co-authored-by: hoshi-hiyouga <[email protected]> [data] support for specifying a dataset in cloud storage (hiyouga#7567) * add support for loading datasets from s3/gcs * add comments to readme * run linter and address comments * add option to pass in kwargs to ray init (i.e. runtime env) * address comment * revert mixed up changes [assets] update wechat (hiyouga#7674) [deps] fix uv conflicts (hiyouga#7686) * fix hiyouga#7678 * Update setup.py * Update tests.yml * Update publish.yml * Update Makefile [model] add GLM-4-0414 (hiyouga#7695) * Update README_zh.md * update [deps] upgrade transformers (hiyouga#7704) [misc] upgrade cli (hiyouga#7714) [misc] fix env vars (hiyouga#7715) [model] Support Kimi_VL thinking/instruct (hiyouga#7719) * add kimi_vl * patch config * check version * Update mm_plugin.py * Update mm_plugin.py --------- Co-authored-by: hoshi-hiyouga <[email protected]> [assets] update model readme (hiyouga#7724) [docker] patch docker-rocm (hiyouga#7725) * Update Dockerfile * Fix typo * Fix syntax for /bin/sh conditional * Add build args to docker-compose * Change shell to /bin/bash This is required for "==" syntax in conditional string comparison [deps] upgrade vllm (hiyouga#7728) [api] fix chat messages (hiyouga#7732) [assets] wechat (hiyouga#7740) [infer] support vllm-ascend (hiyouga#7739) [misc] improve entrypoint (hiyouga#7345) * 纯粹优化下入口代码，因为看到if else太多了 * Update cli.py --------- Co-authored-by: hoshi-hiyouga <[email protected]> [model] support intern-VL 2.5-3 series (hiyouga#7258) * add internvl and rebase * fix for internvl2&3 * remove lines * fix video_inputs & lint * nit * add constants * remove lines * fix * fix error * pass ci * pass ci * skip internvl & nit [infer] set env for vllm ascend (hiyouga#7745) [breaking] bump transformers to 4.45.0 & improve ci (hiyouga#7746) * update ci * fix * fix * fix * fix * fix [trainer] fix pt loss (hiyouga#7748) * fix pt loss * robust * fix * test [assets] update wechat (hiyouga#7792) [misc] fix bug in constant (hiyouga#7765) Co-authored-by: Sachin Beldona <[email protected]> [model] fix gemma3 export (hiyouga#7786) Co-authored-by: hoshi-hiyouga <[email protected]> [misc] fix new tokens adding (hiyouga#7253) Co-authored-by: hoshi-hiyouga <[email protected]> [data] Fix wrong position ids with packed attention masks (hiyouga#7754) Co-authored-by: hoshi-hiyouga <[email protected]> [parser] support omegaconf (hiyouga#7793) [trainer] Add Muon Optimizer (hiyouga#7749) Co-authored-by: hoshi-hiyouga <[email protected]> [example] add bash usage (hiyouga#7794) [data] improve mmplugin (hiyouga#7795) [trainer] support early stop (hiyouga#7797) [misc] update internvl constants (hiyouga#7801) [model] add arch check for InternVL (hiyouga#7803) [assets] update model readme (hiyouga#7804) [data] fix internvl plugin (hiyouga#7817) [model] fix moe zero3 (hiyouga#7826) Merge commit from fork [model] fix vit gradient checkpointing (hiyouga#7830) [assets] update wechat (hiyouga#7840) [ray] add storage filesystem to ray config (hiyouga#7854) fix attn patch for kimivl (hiyouga#7867) [data] fix minicpmo vllm infer (hiyouga#7870) [trainer] make projector trainable in freeze training (hiyouga#7872) Co-authored-by: hoshi-hiyouga <[email protected]> [data] fix qwen2 omni plugin (hiyouga#7875) [model] fix dsv3 leaf node (hiyouga#7879) [data] fix qwen2.5 omni template (hiyouga#7883) [model] add qwen3 (hiyouga#7885) support lora sft dsv3 update code update eval yaml rebase sync w/ major branch update baseline

* add kimi_vl * patch config * check version * Update mm_plugin.py * Update mm_plugin.py --------- Co-authored-by: hoshi-hiyouga <[email protected]>

Kuangdd01 added 3 commits April 14, 2025 03:46

add kimi_vl

4f1eb3f

patch config

995fc07

check version

749c673

hiyouga added the pending This problem is yet to be addressed label Apr 14, 2025

hiyouga added 2 commits April 15, 2025 00:11

Update mm_plugin.py

35a05b5

Update mm_plugin.py

2728134

hiyouga approved these changes Apr 14, 2025

View reviewed changes

hiyouga merged commit df8752e into hiyouga:main Apr 14, 2025
12 checks passed

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Apr 14, 2025

Himanshunitrr mentioned this pull request Apr 24, 2025

Doubt while trying Full SFT MoonshotAI/Kimi-VL#42

Open

Kuangdd01 mentioned this pull request May 6, 2025

Kimi-VL-A3B-Instruct were not used when initializing KimiVLForConditionalGeneration #7960

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[model] Support Kimi_VL thinking/instruct #7719

[model] Support Kimi_VL thinking/instruct #7719

Kuangdd01 commented Apr 14, 2025 •

edited

Loading

Uh oh!

chenllliang commented Apr 14, 2025

Uh oh!

hiyouga left a comment

Uh oh!

Uh oh!

Himanshunitrr commented Apr 24, 2025

Uh oh!

Gaoyg commented Apr 29, 2025 •

edited

Loading

Uh oh!

Kuangdd01 commented Apr 29, 2025

Uh oh!

Kuangdd01 commented Apr 29, 2025 •

edited

Loading

Uh oh!

Gaoyg commented Apr 29, 2025

Uh oh!

Uh oh!

[model] Support Kimi_VL thinking/instruct #7719

[model] Support Kimi_VL thinking/instruct #7719

Conversation

Kuangdd01 commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Have tested

Exp configs

Support

TODO

Before submitting

Uh oh!

chenllliang commented Apr 14, 2025

Uh oh!

hiyouga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Himanshunitrr commented Apr 24, 2025

Uh oh!

Gaoyg commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kuangdd01 commented Apr 29, 2025

Uh oh!

Kuangdd01 commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Gaoyg commented Apr 29, 2025

Uh oh!

Uh oh!

Kuangdd01 commented Apr 14, 2025 •

edited

Loading

Gaoyg commented Apr 29, 2025 •

edited

Loading

Kuangdd01 commented Apr 29, 2025 •

edited

Loading