Release v0.4.1 · thu-pacman/chitu

Supported Expert Parallelism (EP). Enable it by setting infer.ep_size (which currently should be equal to infer.tp_size, parallelizing the attention part with TP in the same degree of parallelism).
Supported PD-disaggregated inference (requiring additional dependencies, currently, please build it manually based on the Dockerfile following the mooncake configuration guideline).
Supported hardware fp4 computation on NVIDIA Blackwell GPUs (requiring additional dependencies, available when building from blackwell.Dockerfile).
Added supports to some new models. See chitu/docs/en/SUPPORTED_MODELS.md at public-main · thu-pacman/chitu for details.
Fixed multiple bugs.

支持专家并行（EP），设置 infer.ep_size 使用（目前需要与 infer.tp_size 相等，表示 attention 部分以相同的并行度进行 TP 并行）。
支持 PD 分离（需要额外依赖，当前请基于赤兔基础镜像，参考 mooncake 配置指南手动构建）。
支持在 NVIDIA Blackwell GPU 上进行硬件 fp4 计算（需要额外依赖，建议通过 blackwell.Dockerfile 构建镜像）。
新增部分模型支持，详见 chitu/docs/zh/SUPPORTED_MODELS.md at public-main · thu-pacman/chitu。
修复若干缺陷。

Official Docker images / 官方 docker 镜像:

Provide feedback