P2PNet模型移植出错，报错RuntimeError: CNRT error: Failure on Queue. - 寒武纪软件开发平台

P2PNet模型移植出错，报错RuntimeError: CNRT error: Failure on Queue. 已完结 雕刻时光2022-11-03 17:30:29 回复 12 查看 使用求助

0 赞 0 收藏
分享到:

【寒武纪硬件产品型号】必填*：MLU270
必填项，例如：MLU370

【使用操作系统】必填*：ubuntu

例如：ubuntu

【使用驱动版本】必填*： v4.8.0

【出错信息】必填*：
2022-11-03 17:05:28.671076: [cnrtError] [31174] [Card : 0] MLU unfinished. cnrtStream fail.
2022-11-03 17:05:28.671099: [cnrtError] [31174] [Card : 0] unknown errorCNRT ERROR in file "/pytorch/catch/torch_mlu/csrc/jit/fuser/fused_kernel.cpp" on line 271.
Traceback (most recent call last):
File "mlu_forward.py", line 227, in <module>
    mlu_forward(opt.img, use_mlu=True, mlu220=save_offline_model)
File "mlu_forward.py", line 212, in mlu_forward
    out = fuison_model(example_tensor)
File "/torch/venv3/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
RuntimeError: CNRT error: Failure on Queue.
The above operation failed in interpreter, with the following stack trace:

【当前已做了哪些信息确认】选填：
1.CPU能运行，以及在PC端的pytoch1.7.0版可以进行torch.jit.trace().save()，可以进行模型量化并保存量化模型。

2.该网络中将后处理放入了网络中，之前出现错误"RuntimeError: Can not call cpu_data on an empty tensor"，将后处理注释掉后(如果对后处理的Tensor加上(.to(ct.mlu_device()反而出现内存不停上涨卡死的情况)，仅保留前向传播的变量部分便出现上述错误

【相关日志文档】选填
如有，可附件

【出错代码链接】选填：
该代码来自：https://github.com/TencentYoutuResearch/CrowdCounting-P2PNet

热门帖子