Skip to content

v0.4.1

Latest
Compare
Choose a tag to compare
@pachinko pachinko released this 14 Aug 12:47
· 23 commits to public-main since this release
  • Supported Expert Parallelism (EP). Enable it by setting infer.ep_size (which currently should be equal to infer.tp_size, parallelizing the attention part with TP in the same degree of parallelism).
  • Supported PD-disaggregated inference (requiring additional dependencies, currently, please build it manually based on the Dockerfile following the mooncake configuration guideline).
  • Supported hardware fp4 computation on NVIDIA Blackwell GPUs (requiring additional dependencies, available when building from blackwell.Dockerfile).
  • Added supports to some new models. See chitu/docs/en/SUPPORTED_MODELS.md at public-main · thu-pacman/chitu for details.
  • Fixed multiple bugs.

  • 支持专家并行(EP),设置 infer.ep_size 使用(目前需要与 infer.tp_size 相等,表示 attention 部分以相同的并行度进行 TP 并行)。
  • 支持 PD 分离(需要额外依赖,当前请基于赤兔基础镜像,参考 mooncake 配置指南手动构建)。
  • 支持在 NVIDIA Blackwell GPU 上进行硬件 fp4 计算(需要额外依赖,建议通过 blackwell.Dockerfile 构建镜像)。
  • 新增部分模型支持,详见 chitu/docs/zh/SUPPORTED_MODELS.md at public-main · thu-pacman/chitu
  • 修复若干缺陷。

Official Docker images / 官方 docker 镜像:

  • NVIDIA: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-nvidia:v0.4.1
  • Muxi: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-muxi:v0.4.1
  • Ascend: qingcheng-ai-cn-beijing.cr.volces.com/public/chitu-ascend:v0.4.1