LV.1

hustgao

19积分0赞

1 帖子 13 回复 0 收藏

TA的动态

TA的帖子

TA的回复

移植bert模型失败：RuntimeError: Cannot set version_counter for inference tensor 我的回复：#12hustgao回复另外，我最近又调了下1.9版本的，发现了fallback问题，请问这个问题怎么解呢？[ERROR][/torch/catch/torch_mlu/csrc/aten/operators/cnnl/_s_where.cpp:21][cnnl__s_where][process:2374][thread:140484020524864]:x or y dtype of cnnl where op not implemented for 'Long'[WARNING][/torch/catch/torch_mlu/csrc/aten/operators/mlu_type_default.cpp:2647][_s_where][process:2374][thread:140484020524864]: MLU _s_where failed, fallback to run on CPU automatically![ERROR][/torch/catch/torch_mlu/csrc/aten/operators/cnnl/_s_where.cpp:21][cnnl__s_where][process:2374][thread:140484020524864]:x or y dtype of cnnl where op not implemented for 'Long'[WARNING][/torch/catch/torch_mlu/csrc/aten/operators/mlu_type_default.cpp:2647][_s_where][process:2374][thread:140484020524864]: MLU _s_where failed, fallback to run on CPU automatically![ERROR][/torch/catch/torch_mlu/csrc/aten/operators/cnnl/_s_where.cpp:21][cnnl__s_where][process:2374][thread:140484020524864]:x or y dtype of cnnl where op not implemented for 'Long'[WARNING][/torch/catch/torch_mlu/csrc/aten/operators/mlu_type_default.cpp:2647][_s_where][process:2374][thread:140484020524864]: MLU _s_where failed, fallback to run on CPU automatically![ERROR][/torch/catch/torch_mlu/csrc/aten/operators/cnnl/_s_where.cpp:21][cnnl__s_where][process:2374][thread:140484020524864]:x or y dtype of cnnl where op not implemented for 'Long'[WARNING][/torch/catch/torch_mlu/csrc/aten/operators/mlu_type_default.cpp:2647][_s_where][process:2374][thread:140484020524864]: MLU _s_where failed, fallback to run on CPU automatically![ERROR][/torch/catch/torch_mlu/csrc/aten/operators/cnnl/_s_where.cpp:21][cnnl__s_where][process:2374][thread:140484020524864]:x or y dtype of cnnl where op not implemented for 'Long'[WARNING][/torch/catch/torch_mlu/csrc/aten/operators/mlu_type_default.cpp:2647][_s_where][process:2374][thread:140484020524864]: MLU _s_where failed, fallback to run on CPU automatically![ERROR][/torch/catch/torch_mlu/csrc/aten/operators/cnnl/_s_where.cpp:21][cnnl__s_where][process:2374][thread:140484020524864]:x or y dtype of cnnl where op not implemented for 'Long'[WARNING][/torch/catch/torch_mlu/csrc/aten/operators/mlu_type_default.cpp:2647][_s_where][process:2374][thread:140484020524864]: MLU _s_where failed, fallback to run on CPU automatically![ERROR][/torch/catch/torch_mlu/csrc/aten/operators/cnnl/_s_where.cpp:21][cnnl__s_where][process:2374][thread:140484020524864]:x or y dtype of cnnl where op not implemented for 'Long'[WARNING][/torch/catch/torch_mlu/csrc/aten/operators/mlu_type_default.cpp:2647][_s_where][process:2374][thread:140484020524864]: MLU _s_where failed, fallback to run on CPU automatically![ERROR][/torch/catch/torch_mlu/csrc/aten/operators/cnnl/_s_where.cpp:21][cnnl__s_where][process:2374][thread:140484020524864]:x or y dtype of cnnl where op not implemented for 'Long'[WARNING][/torch/catch/torch_mlu/csrc/aten/operators/mlu_type_default.cpp:2647][_s_where][process:2374][thread:140484020524864]: MLU _s_where failed, fallback to run on CPU automatically! 0

移植bert模型失败：RuntimeError: Cannot set version_counter for inference tensor 我的回复：#11hustgao回复734694311@qq.com, 镜像可以发到这里哈，多谢。另外，我最近又调了下1.9版本的，发现了fallback问题，请问这个问题怎么解呢？ 0

移植bert模型失败：RuntimeError: Cannot set version_counter for inference tensor 我的回复：#10xiedong2022回复您好，370的新版本目前没有发布。方便的话，您把联系方式发给我们邮箱（ ecosystem@cambricon.com），到时候我们给您提供一个新版本。展开734694311@qq.com, 镜像可以发到这里哈，多谢。 0

移植bert模型失败：RuntimeError: Cannot set version_counter for inference tensor 我的回复：#7xiedong2022回复你用一下最新的版本。你的版本有点老。哪里可以拉取新镜像 0

移植bert模型失败：RuntimeError: Cannot set version_counter for inference tensor 我的回复：#6MyAI回复看着像是device 错了，可以看下代码中操作device的地方，逐一排查下关键我同一份代码，换1.6的pyorch镜像就能run，应该不是device问题 0

移植bert模型失败：RuntimeError: Cannot set version_counter for inference tensor 我的回复：#3xiedong2022回复根据现在的日志无法判断具体的原因。建议确认以下信息：明确测试的网络，bert还是resnet50？mlu的pytorch版本信息。原始代码做过什么改动。确认测试网络放到mlu上的步骤。在这个环境下，跑一下CPU的结果，看是否正常。展开所以我怀疑你们的pytorch镜像有问题，能否给一个版本镜像给我，我再试下？ 0

移植bert模型失败：RuntimeError: Cannot set version_counter for inference tensor 我的回复：#3xiedong2022回复根据现在的日志无法判断具体的原因。建议确认以下信息：明确测试的网络，bert还是resnet50？mlu的pytorch版本信息。原始代码做过什么改动。确认测试网络放到mlu上的步骤。在这个环境下，跑一下CPU的结果，看是否正常。展开bert 和resnet50都出现这个问题，放cpu也正常，我把torch1.9降为1.6版本，就可以run了。 0

移植bert模型失败：RuntimeError: Cannot set version_counter for inference tensor 我的回复：#1xiedong2022回复你好：您这是在什么环境下（370/270），参考哪个步骤进行的网络移植？运行的哪个脚本出现的错误？请提供详细的步骤和信息，方便定位问题。谢谢！展开硬件：mlu370软件： yellow.hub.cambricon.com/pytorch/pytorch:v1.4.0-torch1.9-ubuntu18.04参考文档： https://www.cambricon.com/docs/sdk_1.7.0/cambricon_pytorch_1.6.0/porting_1.9/pytorch_3_porting/pytorch_porting.html#torch-mlu 跑的自研resnet50的demo，完整错误如下：Cannot set version_counter for inference tensor[WARNING][/torch/catch/torch_mlu/csrc/aten/operators/mlu_type_default.cpp:935][convolution_overrideable][process:33242][thread:139957320275776]: MLU convolution_overrideable failed, fallback to run on CPU automatically!Traceback (most recent call last): File "inference.py", line 535, in <module> main() File "inference.py", line 454, in main args, model, data_loader, device, args.show, args.show_dir, **show_kwargs File "inference.py", line 157, in single_gpu_test result = model(return_loss=False, **data) File "/torch/venv3/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/torch/venv3/pytorch/lib/python3.6/site-packages/mmcv/runner/fp16_utils.py", line 119, in new_func return old_func(*args, **kwargs) File "/torch/venv3/pytorch/lib/python3.6/site-packages/mmcls/models/classifiers/base.py", line 85, in forward return self.forward_test(img, **kwargs) File "/torch/venv3/pytorch/lib/python3.6/site-packages/mmcls/models/classifiers/base.py", line 67, in forward_test return self.simple_test(imgs[0], **kwargs) File "/torch/venv3/pytorch/lib/python3.6/site-packages/mmcls/models/classifiers/image.py", line 152, in simple_test x = self.extract_feat(img) File "/torch/venv3/pytorch/lib/python3.6/site-packages/mmcls/models/classifiers/image.py", line 111, in extract_feat x = self.backbone(img) File "/torch/venv3/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/torch/venv3/pytorch/lib/python3.6/site-packages/mmcls/models/backbones/resnet_cifar.py", line 72, in forward x = self.conv1(x) File "/torch/venv3/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/torch/venv3/pytorch/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 443, in forward return self._conv_forward(input, self.weight, self.bias) File "/torch/venv3/pytorch/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 440, in _conv_forward self.padding, self.dilation, self.groups)RuntimeError: MLU convolution_overrideable does not have fallback CPU implementation!能否解释下：为啥conv2d会run到convolution_overrideable这个算子呢，而且有异常，然后catch有没有对这个算子做fallback to cpu 0