- Supported Expert Parallelism (EP). Enable it by setting
infer.ep_size
(which currently should be equal toinfer.tp_size
, parallelizing the attention part with TP in the same degree of parallelism). - Supported PD-disaggregated inference (requiring additional dependencies, currently, please build it manually based on the
Dockerfile
following the mooncake configuration guideline). - Supported hardware fp4 computation on NVIDIA Blackwell GPUs (requiring additional dependencies, available when building from
blackwell.Dockerfile
). - Added supports to some new models. See chitu/docs/en/SUPPORTED_MODELS.md at public-main · thu-pacman/chitu for details.
- Fixed multiple bugs.
- 支持专家并行(EP),设置
infer.ep_size
使用(目前需要与infer.tp_size
相等,表示 attention 部分以相同的并行度进行 TP 并行)。 - 支持 PD 分离(需要额外依赖,当前请基于赤兔基础镜像,参考 mooncake 配置指南手动构建)。
- 支持在 NVIDIA Blackwell GPU 上进行硬件 fp4 计算(需要额外依赖,建议通过
blackwell.Dockerfile
构建镜像)。 - 新增部分模型支持,详见 chitu/docs/zh/SUPPORTED_MODELS.md at public-main · thu-pacman/chitu。
- 修复若干缺陷。
Official Docker images / 官方 docker 镜像:
- NVIDIA: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-nvidia:v0.4.1
- Muxi: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-muxi:v0.4.1
- Ascend: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-ascend:v0.4.1