切换版块
×
基础软件平台
PyTorch开发
TensorFlow开发
编解码及SDK开发
BANG语言与计算库
开发工具链
MagicMind开发
软件栈百科
云平台集成
硬件产品专区
MLU370系列AI加速卡
MLU270系列AI加速卡
MLU220系列AI加速产品
经验方案交流区
经验案例与实践分享
开发者服务
开发者活动
公告与版务
高校支持
《智能计算系统》
签到
版块
社区
文档
SDK下载
370系列
200系列
开发平台
官网首页
注册
登录
全部版块
基础软件平台
硬件产品专区
经验方案交流区
开发者服务
高校支持
发布新帖
登录/注册
LV.1
hustgao
19
积分
0
赞
1
帖子
13
回复
0
收藏
TA的动态
TA的帖子
TA的回复
移植bert模型失败:RuntimeError: Cannot set version_counter for inference tensor
我的回复:#12hustgao回复另外, 我最近又调了下1.9版本的, 发现了fallback问题, 请问这个问题怎么解呢?[ERROR][/torch/catch/torch_mlu/csrc/aten/operators/cnnl/_s_where.cpp:21][cnnl__s_where][process:2374][thread:140484020524864]:x or y dtype of cnnl where op not implemented for 'Long'[WARNING][/torch/catch/torch_mlu/csrc/aten/operators/mlu_type_default.cpp:2647][_s_where][process:2374][thread:140484020524864]: MLU _s_where failed, fallback to run on CPU automatically![ERROR][/torch/catch/torch_mlu/csrc/aten/operators/cnnl/_s_where.cpp:21][cnnl__s_where][process:2374][thread:140484020524864]:x or y dtype of cnnl where op not implemented for 'Long'[WARNING][/torch/catch/torch_mlu/csrc/aten/operators/mlu_type_default.cpp:2647][_s_where][process:2374][thread:140484020524864]: MLU _s_where failed, fallback to run on CPU automatically![ERROR][/torch/catch/torch_mlu/csrc/aten/operators/cnnl/_s_where.cpp:21][cnnl__s_where][process:2374][thread:140484020524864]:x or y dtype of cnnl where op not implemented for 'Long'[WARNING][/torch/catch/torch_mlu/csrc/aten/operators/mlu_type_default.cpp:2647][_s_where][process:2374][thread:140484020524864]: MLU _s_where failed, fallback to run on CPU automatically![ERROR][/torch/catch/torch_mlu/csrc/aten/operators/cnnl/_s_where.cpp:21][cnnl__s_where][process:2374][thread:140484020524864]:x or y dtype of cnnl where op not implemented for 'Long'[WARNING][/torch/catch/torch_mlu/csrc/aten/operators/mlu_type_default.cpp:2647][_s_where][process:2374][thread:140484020524864]: MLU _s_where failed, fallback to run on CPU automatically![ERROR][/torch/catch/torch_mlu/csrc/aten/operators/cnnl/_s_where.cpp:21][cnnl__s_where][process:2374][thread:140484020524864]:x or y dtype of cnnl where op not implemented for 'Long'[WARNING][/torch/catch/torch_mlu/csrc/aten/operators/mlu_type_default.cpp:2647][_s_where][process:2374][thread:140484020524864]: MLU _s_where failed, fallback to run on CPU automatically![ERROR][/torch/catch/torch_mlu/csrc/aten/operators/cnnl/_s_where.cpp:21][cnnl__s_where][process:2374][thread:140484020524864]:x or y dtype of cnnl where op not implemented for 'Long'[WARNING][/torch/catch/torch_mlu/csrc/aten/operators/mlu_type_default.cpp:2647][_s_where][process:2374][thread:140484020524864]: MLU _s_where failed, fallback to run on CPU automatically![ERROR][/torch/catch/torch_mlu/csrc/aten/operators/cnnl/_s_where.cpp:21][cnnl__s_where][process:2374][thread:140484020524864]:x or y dtype of cnnl where op not implemented for 'Long'[WARNING][/torch/catch/torch_mlu/csrc/aten/operators/mlu_type_default.cpp:2647][_s_where][process:2374][thread:140484020524864]: MLU _s_where failed, fallback to run on CPU automatically![ERROR][/torch/catch/torch_mlu/csrc/aten/operators/cnnl/_s_where.cpp:21][cnnl__s_where][process:2374][thread:140484020524864]:x or y dtype of cnnl where op not implemented for 'Long'[WARNING][/torch/catch/torch_mlu/csrc/aten/operators/mlu_type_default.cpp:2647][_s_where][process:2374][thread:140484020524864]: MLU _s_where failed, fallback to run on CPU automatically!
0
移植bert模型失败:RuntimeError: Cannot set version_counter for inference tensor
我的回复:#11hustgao回复734694311@qq.com, 镜像可以发到这里哈, 多谢。另外, 我最近又调了下1.9版本的, 发现了fallback问题, 请问这个问题怎么解呢?
0
移植bert模型失败:RuntimeError: Cannot set version_counter for inference tensor
我的回复:#10xiedong2022回复您好,370的新版本目前没有发布。方便的话,您把联系方式发给我们邮箱( ecosystem@cambricon.com),到时候我们给您提供一个新版本。展开734694311@qq.com, 镜像可以发到这里哈, 多谢。
0
移植bert模型失败:RuntimeError: Cannot set version_counter for inference tensor
我的回复:#7xiedong2022回复你用一下最新的版本。你的版本有点老。哪里可以拉取新镜像
0
移植bert模型失败:RuntimeError: Cannot set version_counter for inference tensor
我的回复:#6MyAI回复看着像是device 错了,可以看下代码中操作device的地方,逐一排查下关键我同一份代码, 换1.6的pyorch镜像就能run, 应该不是device问题
0
移植bert模型失败:RuntimeError: Cannot set version_counter for inference tensor
我的回复:#3xiedong2022回复根据现在的日志无法判断具体的原因。建议确认以下信息:明确测试的网络,bert还是resnet50?mlu的pytorch版本信息。原始代码做过什么改动。确认测试网络放到mlu上的步骤。在这个环境下,跑一下CPU的结果,看是否正常。展开所以我怀疑你们的pytorch镜像有问题,能否给一个版本镜像给我, 我再试下?
0
移植bert模型失败:RuntimeError: Cannot set version_counter for inference tensor
我的回复:#3xiedong2022回复根据现在的日志无法判断具体的原因。建议确认以下信息:明确测试的网络,bert还是resnet50?mlu的pytorch版本信息。原始代码做过什么改动。确认测试网络放到mlu上的步骤。在这个环境下,跑一下CPU的结果,看是否正常。展开bert 和resnet50都出现这个问题, 放cpu也正常, 我把torch1.9降为1.6版本,就可以run了。
0
移植bert模型失败:RuntimeError: Cannot set version_counter for inference tensor
我的回复:#1xiedong2022回复你好:您这是在什么环境下(370/270),参考哪个步骤进行的网络移植?运行的哪个脚本出现的错误?请提供详细的步骤和信息,方便定位问题。谢谢!展开硬件:mlu370软件: yellow.hub.cambricon.com/pytorch/pytorch:v1.4.0-torch1.9-ubuntu18.04参考文档: https://www.cambricon.com/docs/sdk_1.7.0/cambricon_pytorch_1.6.0/porting_1.9/pytorch_3_porting/pytorch_porting.html#torch-mlu 跑的自研resnet50的demo, 完整错误如下:Cannot set version_counter for inference tensor[WARNING][/torch/catch/torch_mlu/csrc/aten/operators/mlu_type_default.cpp:935][convolution_overrideable][process:33242][thread:139957320275776]: MLU convolution_overrideable failed, fallback to run on CPU automatically!Traceback (most recent call last): File "inference.py", line 535, in <module> main() File "inference.py", line 454, in main args, model, data_loader, device, args.show, args.show_dir, **show_kwargs File "inference.py", line 157, in single_gpu_test result = model(return_loss=False, **data) File "/torch/venv3/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/torch/venv3/pytorch/lib/python3.6/site-packages/mmcv/runner/fp16_utils.py", line 119, in new_func return old_func(*args, **kwargs) File "/torch/venv3/pytorch/lib/python3.6/site-packages/mmcls/models/classifiers/base.py", line 85, in forward return self.forward_test(img, **kwargs) File "/torch/venv3/pytorch/lib/python3.6/site-packages/mmcls/models/classifiers/base.py", line 67, in forward_test return self.simple_test(imgs[0], **kwargs) File "/torch/venv3/pytorch/lib/python3.6/site-packages/mmcls/models/classifiers/image.py", line 152, in simple_test x = self.extract_feat(img) File "/torch/venv3/pytorch/lib/python3.6/site-packages/mmcls/models/classifiers/image.py", line 111, in extract_feat x = self.backbone(img) File "/torch/venv3/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/torch/venv3/pytorch/lib/python3.6/site-packages/mmcls/models/backbones/resnet_cifar.py", line 72, in forward x = self.conv1(x) File "/torch/venv3/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/torch/venv3/pytorch/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 443, in forward return self._conv_forward(input, self.weight, self.bias) File "/torch/venv3/pytorch/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 440, in _conv_forward self.padding, self.dilation, self.groups)RuntimeError: MLU convolution_overrideable does not have fallback CPU implementation!能否解释下:为啥conv2d会run到convolution_overrideable这个算子呢, 而且有异常, 然后catch有没有对这个算子做fallback to cpu
0
Github
开发平台
文档中心
新手必读
官方微信
版权所有 © 2024 寒武纪 Cambricon.com 备案/许可证号:
京ICP备17003415号-1
关闭