Skip to content

Profiler error #1452

@Adoni

Description

@Adoni

I'm running profiler in A100 docker by python mla.py --batch-size 64 --seq-len 1024 --num-heads 128 --profiler-buffer-size 1048576, but it raise an error:

(py312) root@e1229c801072:/workspace/flashinfer/profiler# python mla.py --batch-size 64 --seq-len 1024 --num-heads 128 --profiler-buffer-size 1048576

W0811 02:46:17.160000 34 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
W0811 02:46:17.160000 34 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
W0811 02:46:17.579000 34 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
W0811 02:46:17.579000 34 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
Traceback (most recent call last):
  File "/workspace/flashinfer/profiler/mla.py", line 107, in <module>
    profile_deepseek_mla_decode(
  File "/workspace/flashinfer/profiler/mla.py", line 71, in profile_deepseek_mla_decode
    o = wrapper.run(
        ^^^^^^^^^^^^
  File "/workspace/flashinfer/flashinfer/mla.py", line 419, in run
    self._cached_module.run.default(
  File "/opt/conda/envs/py312/lib/python3.12/site-packages/torch/_ops.py", line 829, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Failed to run MLA, error: no kernel image is available for execution on the device

I checked cuda version, nvcc, and pytorch cuda version as following:

(py312) root@e1229c801072:/workspace/flashinfer/profiler# nvidia-smi
Mon Aug 11 02:47:03 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+

(py312) root@e1229c801072:/workspace/flashinfer/profiler# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Jan_15_19:20:09_PST_2025
Cuda compilation tools, release 12.8, V12.8.61
Build cuda_12.8.r12.8/compiler.35404655_0
(py312) root@e1229c801072:/workspace/flashinfer/profiler# python
Python 3.12.11 | packaged by Anaconda, Inc. | (main, Jun  5 2025, 13:09:17) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch

>>> print(torch.cuda.is_available())
True
>>> print(torch.version.cuda)
12.8
>>> print(torch.cuda.get_device_name(0))
NVIDIA A100-PCIE-40GB
>>> print(torch.cuda.device_count())
1

The cuda related versions are all 12.8.

What should I do now?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions