打开微信,使用扫一扫进入页面后,点击右上角菜单,
点击“发送给朋友”或“分享到朋友圈”完成分享
【寒武纪硬件产品型号】MLU370-S4
【使用操作系统】Ubuntu 20.04
【使用驱动版本】v4.20.18
【环境】
镜像:官网下载(pytorch-v1.17.0-torch1.13.1-ubuntu20.04-py310.tar.gz)
torch-mlu 1.17.0+torch1.13(镜像自带)torch 1.13.1(镜像自带)
2mmcv 2.1.0+v1.0.0(寒武纪官方编译包:mmcv-2.1.0+v1.0.0-cp310-cp310-linux_x86_64.whl)
mmengine 0.10.7 (这个也是官方包一并安装)
mmdet 3.3.0 (pip install -v -e . 命令安装)
Python 3.10.8(镜像自带)
【出错信息】
MLU 推理速度慢,且高分辨率直接超时
1.输入尺寸提高到 928×768 时触发错误直接中断;
信息为:
2025-09-05 15:03:49.597136: [cnrtError] [2411] [Card: 0] Error occurred during calling 'cnQueueSync' in CNDrv interface.
2025-09-05 15:03:49.597196: [cnrtError] [2411] [Card: 0] Return value is 100121, CN_INVOKE_ERROR_EXECUTED_TIMEOUT.
2025-09-05 15:03:49.597211: [cnrtError] [2411] [Card: 0] cnrtQueueSync: MLU queue sync failed.
[ERROR][/torch/catch/torch_mlu/csrc/ work/core/queue.h:181][synchronize][process:2411][thread:140360340940608]:
0%| | 0/5 [00:37<?, ?it/s]
Traceback (most recent call last):
File "/home/share/Semi-CODINO_mlu/output_txt_1230.py", line 67, in <module>
result = inferencer(inputs=img_path, batch_size=batch_size)
File "/home/share/Semi-CODINO_mlu/mmdet/apis/det_inferencer.py", line 403, in __call__
preds = self.forward(data, **forward_kwargs)
File "/torch/venv3/pytorch/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/torch/venv3/pytorch/lib/python3.10/site-packages/mmengine/infer/infer.py", line 296, in forward
return self.model.test_step(inputs)
File "/torch/venv3/pytorch/lib/python3.10/site-packages/mmengine/model/ _model/ _model.py", line 145, in test_step
return self._run_forward(data, mode='predict') # type: ignore
File "/torch/venv3/pytorch/lib/python3.10/site-packages/mmengine/model/ _model/ _model.py", line 361, in _run_forward
results = self(**data, mode=mode)
File "/torch/venv3/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/share/Semi-CODINO_mlu/mmdet/models/detectors/ .py", line 95, in forward
return self.predict(inputs, data_samples)
File "/home/share/Semi-CODINO_mlu/projects/CO-DETR/codetr/codetr.py", line 279, in predict
results_list = self.predict_query_head(
File "/home/share/Semi-CODINO_mlu/projects/CO-DETR/codetr/codetr.py", line 290, in predict_query_head
return self.query_head.predict(
File "/home/share/Semi-CODINO_mlu/projects/CO-DETR/codetr/co_dino_head.py", line 194, in predict
outs = self.forward(feats, batch_img_ s)
File "/home/share/Semi-CODINO_mlu/projects/CO-DETR/codetr/co_dino_head.py", line 135, in forward
self.transformer(
File "/torch/venv3/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/share/Semi-CODINO_mlu/projects/CO-DETR/codetr/transformer.py", line 1160, in forward
memory = self.encoder(
File "/torch/venv3/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/torch/venv3/pytorch/lib/python3.10/site-packages/mmcv/cnn/bricks/transformer.py", line 941, in forward
query = (
File "/torch/venv3/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/torch/venv3/pytorch/lib/python3.10/site-packages/mmcv/cnn/bricks/transformer.py", line 830, in forward
query = self.attentions[attn_index](
File "/torch/venv3/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/torch/venv3/pytorch/lib/python3.10/site-packages/mmengine/utils/misc.py", line 395, in new_func
output = old_func(*args, **kwargs)
File "/torch/venv3/pytorch/lib/python3.10/site-packages/mmcv/ops/multi_scale_deform_attn.py", line 335, in forward
assert (spatial_shapes[:, 0] * spatial_shapes[:, 1]).sum() == num_value
RuntimeError: CNRT error: failed to call the driver-api function.
2.当输入图像分辨率降为648×512 在 MLU370-S4 上推理耗时 ≈ 80 s,同图 CPU 仅 60 s;
【当前已做了哪些信息确认】
在跑的比较慢的情况下 即输入图像为648×512时
使用cnmon命令查看 发现mlu的温度以及功耗保持不变 怀疑mlu未进行加速
使用cnmoninfo的输出为
================CNMON v4.20.18================
Card 0
Product Name : MLU370-S4
SN :
UUID :
Firmware : v1.0.2
Driver : v4.20.18
Utilization :
Image Codec 0-1 : 0 % 0 %
Image Codec 2-3 : 0 % 0 %
Image Codec 4-5 : 0 % 0 %
Image Codec 6-7 : 0 % 0 %
Video Decoder 0-1 : 0 % 0 %
Video Decoder 2-3 : 0 % 0 %
Video Decoder 4-5 : 0 % 0 %
Video Decoder 6-7 : 0 % 0 %
Video Decoder 8-9 : 0 % 0 %
Video Encoder 10-11 : 0 % 0 %
MLU Average : 66 %
MLU 0-3 : 100 % 100 % 100 % 100 %
MLU 4-7 : 100 % 100 % 100 % 100 %
MLU 8-11 : 100 % 100 % 100 % 100 %
MLU 12-15 : 100 % 100 % 100 % 100 %
MLU 16-19 : 0 % 0 % 0 % 0 %
MLU 20-23 : 0 % 0 % 0 % 0 %
idle : 8
busy : 16
DEVICE CPU Chip : 25 %
DEVICE CPU Core 0-1 : 0 % 100 %
DEVICE CPU Core 2-3 : 0 % 0 %
CPU Sampling Interval : 100 ms
Codec Turbo : N/A
Fan Speed : 0 %
Temperature :
Board : 39 C
Chip : 43 C
Memory : 31 C
Frequency :
IPU : 1000 MHz
Physical Memory Usage :
Total : 23365 MiB
Used : 3808 MiB
Free : 19557 MiB
Channel Memory Usage :
Channel 0 : 3808 MiB
DDR Data Widths : 384 Bit
DDR BandWidth : 307200 MB/s
DDR Transfer Rate : 6400 Mbps
Virtual Memory Usage :
Total : 1048576 MiB
Used : 13610 MiB
Free : 1034966 MiB
ARM OS Memory Usage :
Total : 545776 KB
Used : 173168 KB
Free : 372608 KB
Fast Alloc Memory :
Total : 2916352 KB
Used : 2492416 KB
Free : 423936 KB
Power :
Usage : 25 W
Cap : 75 W
Thermal Design Power : 75 W
Max Power Cap : 75.00 W
Min Power Cap : 40.00 W
Initialized : On
DDR ECC Err Count : N/A
CRC Err Count :
Die2Die CRC Error : 0
Die2Die CRC Error Overflow : 0
Cache :
Total : N/A
Hit : N/A
PCI :
Vendor ID : 0xcabc
Device ID : 0x370
Sub-Vendor ID : 0xcabc
Sub-System ID : 0x53
Domain ID : 0000
Bus num : 86
Device : 00
Function : 0
Physical Slot : Unknown
Max Speed : 16 GT/s
Max Width : x16
Current Speed : 8 GT/s
Current Width : x16
Switch Speed : Unknown
NUMA node id : 1
Bandwidth : N/A
PCIe throughput : N/A
Local memory : N/A
Chassis : N/A
Retired Pages :
Page Retirement : Off
Single Bit ECC : 0
Double Bit ECC : 0
is Pending : No
is Failed : No
Row-Remapping :
Correctable Rows : 0
Uncorrectable Rows : 0
Pending Rows : 0
Failed Rows : 0
Processes :
Process : 0
PID : 4042210
cmdline : python
MLU Memory Usage : 3370 MiB
热门帖子
精华帖子