-
Notifications
You must be signed in to change notification settings - Fork 445
Open
Description
I'm running profiler in A100 docker by python mla.py --batch-size 64 --seq-len 1024 --num-heads 128 --profiler-buffer-size 1048576
, but it raise an error:
(py312) root@e1229c801072:/workspace/flashinfer/profiler# python mla.py --batch-size 64 --seq-len 1024 --num-heads 128 --profiler-buffer-size 1048576
W0811 02:46:17.160000 34 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
W0811 02:46:17.160000 34 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
W0811 02:46:17.579000 34 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
W0811 02:46:17.579000 34 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
Traceback (most recent call last):
File "/workspace/flashinfer/profiler/mla.py", line 107, in <module>
profile_deepseek_mla_decode(
File "/workspace/flashinfer/profiler/mla.py", line 71, in profile_deepseek_mla_decode
o = wrapper.run(
^^^^^^^^^^^^
File "/workspace/flashinfer/flashinfer/mla.py", line 419, in run
self._cached_module.run.default(
File "/opt/conda/envs/py312/lib/python3.12/site-packages/torch/_ops.py", line 829, in __call__
return self._op(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Failed to run MLA, error: no kernel image is available for execution on the device
I checked cuda version, nvcc, and pytorch cuda version as following:
(py312) root@e1229c801072:/workspace/flashinfer/profiler# nvidia-smi
Mon Aug 11 02:47:03 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
(py312) root@e1229c801072:/workspace/flashinfer/profiler# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Jan_15_19:20:09_PST_2025
Cuda compilation tools, release 12.8, V12.8.61
Build cuda_12.8.r12.8/compiler.35404655_0
(py312) root@e1229c801072:/workspace/flashinfer/profiler# python
Python 3.12.11 | packaged by Anaconda, Inc. | (main, Jun 5 2025, 13:09:17) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
True
>>> print(torch.version.cuda)
12.8
>>> print(torch.cuda.get_device_name(0))
NVIDIA A100-PCIE-40GB
>>> print(torch.cuda.device_count())
1
The cuda related versions are all 12.8.
What should I do now?
Metadata
Metadata
Assignees
Labels
No labels