打开微信,使用扫一扫进入页面后,点击右上角菜单,
点击“发送给朋友”或“分享到朋友圈”完成分享
【寒武纪硬件产品型号】:MLU270、
【使用操作系统】:ubuntu
【使用驱动版本】:v4.20.6
【使用AI框架】:Pytorch
【操作步骤】:
我在使用MLU270生成MLU220的离线模型时遇到了问题,具体如下:
设置corenum为1时正确生成,设置为4时报错。报错信息如下。
【出错信息】
2022-08-03 17:06:13.454164: [cnrtError] [1625578] [Card : 3] Error occurred in cnrtSyncQueue during calling driver interface. 2022-08-03 17:06:13.454187: [cnrtError] [1625578] [Card : 3] Return value is 441, MLU_ERROR_WRITE_NRAM_OVERFLOW, means that "mlu write nram overflow" 2022-08-03 17:06:13.454199: [cnrtError] [1625578] [Card : 3] mlu unfinished! for more information, please use core dump analysis tools [ERROR][/pytorch/catch/torch_mlu/csrc/aten/device/queue.h][line:82][syncQueue][thread:139989039994688][process:1625578]: CNRT error: Errors occurred from driver functions that are returned via CNRT. See the detailed error messages on the screen. Traceback (most recent call last): File "../mlu_foward.py", line 499, in <module> main(opt) File "../mlu_foward.py", line 484, in main run(**vars(opt)) File "/torch/venv3/pytorch/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad return func(*args, **kwargs) File "../mlu_foward.py", line 209, in run model = torch.jit.trace(model, trace_input.to(ct.mlu_device()), check_trace=False) File "/torch/venv3/pytorch/lib/python3.6/site-packages/torch/jit/__init__.py", line 858, in trace check_tolerance, _force_outplace, _module_class) File "/torch/venv3/pytorch/lib/python3.6/site-packages/torch/jit/__init__.py", line 997, in trace_module module._c._create_method_from_trace(method_name, func, example_inputs, var_lookup_fn, _force_outplace) File "/torch/venv3/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 539, in __call__ result = self._slow_forward(*input, **kwargs) File "/torch/venv3/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 525, in _slow_forward result = self.forward(*input, **kwargs) File "/workdir/YOLOv5MLU/models/yolo.py", line 148, in forward return self._forward_once(x, profile, visualize) # single-scale inference, train File "/workdir/YOLOv5MLU/models/yolo.py", line 171, in _forward_once x = m(x) # run File "/torch/venv3/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 539, in __call__ result = self._slow_forward(*input, **kwargs) File "/torch/venv3/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 525, in _slow_forward result = self.forward(*input, **kwargs) File "/workdir/YOLOv5MLU/models/yolo.py", line 71, in forward maxboxnum) ValueError: To do for CPU
【出错脚本】
ct.set_core_number(core_number) ct.set_core_version(mcore) ct.set_input_format(input_format) if jit: ct.save_as_cambricon(model_name) # trace network if gray: trace_input = torch.randn(batch_size, 1, 540, 960).float() else: trace_input = torch.randn(batch_size, 3, 540, 960).float() model = torch.jit.trace(model, trace_input.to(ct.mlu_device()), check_trace=False) if save_offline_model: ct.save_as_cambricon(model_name) model(trace_input.to(ct.mlu_device())) ct.save_as_cambricon("") exit(0)
热门帖子
精华帖子