Skip to content

Switch between trtllm-gen vs fa2/fa3 backends inside Attention wrappers #1493

@nvpohanh

Description

@nvpohanh

Switch between trtllm-gen vs fa2/fa3 backends inside Attention wrappers

Currently, vLLM uses the trtllm-gen attn kernels by directly calling the corresponding functions without going through the wrappers. Ideally, we should make the wrappers support all the features needed by trtllm-gen attn kernels so that vLLM only needs to call wrappers.

See: https://github.com/vllm-project/vllm/pull/21716/files#r2270261171

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions