Profiler error

I'm running profiler in A100 docker by `python mla.py --batch-size 64 --seq-len 1024 --num-heads 128 --profiler-buffer-size 1048576`, but it raise an error:


```
(py312) root@e1229c801072:/workspace/flashinfer/profiler# python mla.py --batch-size 64 --seq-len 1024 --num-heads 128 --profiler-buffer-size 1048576

W0811 02:46:17.160000 34 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
W0811 02:46:17.160000 34 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
W0811 02:46:17.579000 34 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
W0811 02:46:17.579000 34 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
Traceback (most recent call last):
  File "/workspace/flashinfer/profiler/mla.py", line 107, in <module>
    profile_deepseek_mla_decode(
  File "/workspace/flashinfer/profiler/mla.py", line 71, in profile_deepseek_mla_decode
    o = wrapper.run(
        ^^^^^^^^^^^^
  File "/workspace/flashinfer/flashinfer/mla.py", line 419, in run
    self._cached_module.run.default(
  File "/opt/conda/envs/py312/lib/python3.12/site-packages/torch/_ops.py", line 829, in __call__
    return self._op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Failed to run MLA, error: no kernel image is available for execution on the device
```

I checked cuda version, nvcc, and pytorch cuda version as following:

```
(py312) root@e1229c801072:/workspace/flashinfer/profiler# nvidia-smi
Mon Aug 11 02:47:03 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+

(py312) root@e1229c801072:/workspace/flashinfer/profiler# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Jan_15_19:20:09_PST_2025
Cuda compilation tools, release 12.8, V12.8.61
Build cuda_12.8.r12.8/compiler.35404655_0
(py312) root@e1229c801072:/workspace/flashinfer/profiler# python
Python 3.12.11 | packaged by Anaconda, Inc. | (main, Jun  5 2025, 13:09:17) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch

>>> print(torch.cuda.is_available())
True
>>> print(torch.version.cuda)
12.8
>>> print(torch.cuda.get_device_name(0))
NVIDIA A100-PCIE-40GB
>>> print(torch.cuda.device_count())
1
```

The cuda related versions are all 12.8.

What should I do now?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Profiler error #1452

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Profiler error #1452

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions