您这要是解释为量化损失的话,那这个量化损失就太离谱了,我转了七八个模型,只有这个lstm模型存在这种问题。是lstm模型量化损失较大吗?另外我提到的设置mean=[0,0,0], std=1,use_firstconv分别设为0和1的情况,结果大不相同,这似乎也不太合理呀?展开
您这要是解释为量化损失的话,那这个量化损失就太离谱了,我转了七八个模型,只有这个lstm模型存在这种问题。是lstm模型量化损失较大吗?另外我提到的设置mean=[0,0,0], std=1,use_firstconv分别设为0和1的情况,结果大不相同,这似乎也不太合理呀?展开
您好,关于量化损失的精度问题,您可以通过python逐层输出,对cpu与mlu的每层结果进行查看,
caffe.set_mode_mlu()
caffe.set_core_number(1)
caffe.set_rt_core("MLU220")
for layer_name, blob in net.blobs.iteritems():
print layer_name + '\t' + str(blob.data)
您好,CNRT的输入和输出默认是NHWC,您需要做相关转置。下方前20个输出结果的mlu和cpu的精度对比,由于您进行了int8量化,因此会对精度造成了一定损失。mlu: 6.890625000000000000e+00 -1.113281250000000000e+00 -5.964843750000000000e+00 -3.063964843750000000e-01 4.343750000000000000e+00 -2.101562500000000000e+00 9.674072265625000000e-02 -1.325195312500000000e+00 -1.450195312500000000e+00 2.435546875000000000e+00 -2.755859375000000000e+00 7.214843750000000000e+00 3.029296875000000000e+00 9.555664062500000000e-01 -5.512695312500000000e-01 2.328125000000000000e+00 9.604492187500000000e-01 6.394531250000000000e+00 -1.645507812500000000e-01 -4.324218750000000000e+0cpu: 9.038786888122558594e+00 -2.188739538192749023e+00 -7.382113933563232422e+00 6.280529499053955078e-02 5.158084392547607422e+00 -2.028887271881103516e+00 -4.253607988357543945e-01 1.081809997558593750e-01 -1.120917081832885742e+00 2.655514478683471680e+00 -3.744379043579101562e+00 9.149456977844238281e+00 3.738803625106811523e+00 -1.252558469772338867e+00 -3.952614068984985352e-01 1.592808723449707031e+00 7.857535481452941895e-01 8.117773056030273438e+00 6.915171146392822266e-01 -5.211871147155761719e+00展开
您好,CNRT的输入和输出默认是NHWC,您需要做相关转置。下方前20个输出结果的mlu和cpu的精度对比,由于您进行了int8量化,因此会对精度造成了一定损失。mlu: 6.890625000000000000e+00 -1.113281250000000000e+00 -5.964843750000000000e+00 -3.063964843750000000e-01 4.343750000000000000e+00 -2.101562500000000000e+00 9.674072265625000000e-02 -1.325195312500000000e+00 -1.450195312500000000e+00 2.435546875000000000e+00 -2.755859375000000000e+00 7.214843750000000000e+00 3.029296875000000000e+00 9.555664062500000000e-01 -5.512695312500000000e-01 2.328125000000000000e+00 9.604492187500000000e-01 6.394531250000000000e+00 -1.645507812500000000e-01 -4.324218750000000000e+0cpu: 9.038786888122558594e+00 -2.188739538192749023e+00 -7.382113933563232422e+00 6.280529499053955078e-02 5.158084392547607422e+00 -2.028887271881103516e+00 -4.253607988357543945e-01 1.081809997558593750e-01 -1.120917081832885742e+00 2.655514478683471680e+00 -3.744379043579101562e+00 9.149456977844238281e+00 3.738803625106811523e+00 -1.252558469772338867e+00 -3.952614068984985352e-01 1.592808723449707031e+00 7.857535481452941895e-01 8.117773056030273438e+00 6.915171146392822266e-01 -5.211871147155761719e+00展开
您这要是解释为量化损失的话,那这个量化损失就太离谱了,我转了七八个模型,只有这个lstm模型存在这种问题。是lstm模型量化损失较大吗?
另外我提到的设置mean=[0,0,0], std=1,use_firstconv分别设为0和1的情况,结果大不相同,这似乎也不太合理呀?
这个试过了,cpu和mlu代码中统一设置:mean=[0,0,0], std=1cpu结果:[[0]] 10.884485[[0]] 21.488785[[0]] 33.21021[[0]] 33.22379[[0]] 29.092976[[0]] 12.942553[[0]] 28.53116[[0]] 30.393711[[0]] 15.588205[[0]] 21.283323[[0]] 30.011887[[0]] 22.923153[[0]] 17.473858[[0]] 28.751379[[0]] 29.601063[[0]] 25.026934[[0]] 14.760608[[0]] 36.701244[[0]] 34.12405[[0]] 20.486364[[0]] 12.991854[[0]] 37.426823[[0]] 25.84907[[0]] 14.0447035[[0]] 34.449768[[0]] 35.63881[[0]] 30.176765mlu220结果(use_firstconv = 0):0 6.96093842 6.2539060 9.1953120 11.60156214 6.3593756 8.7968756 11.4609386 12.5000006 10.99218842 11.10156242 10.56250042 9.75781242 9.88281242 9.69531242 9.52343842 9.4843756 10.4453126 11.5859386 11.7890626 10.6171886 10.3828126 8.8593750 8.4218750 11.5937500 13.8203120 14.2187500 13.960938mlu220结果(use_firstconv = 1):0 11.4609380 15.7578120 28.2187500 12.9453120 15.0156250 9.1250000 12.8203120 10.7734380 13.7734380 11.5156250 12.0156250 11.6171880 12.1250000 14.2343750 13.3359380 14.8046880 12.6640620 11.7109380 10.7265620 13.7578120 8.2578120 11.7187500 10.8046880 14.0703120 14.1093750 14.4843750 12.187500这是板子上的sdk版本: CNRT: 4.7.12 03ea1d9展开
您好,CNRT的输入和输出默认是NHWC,您需要做相关转置。
下方前20个输出结果的mlu和cpu的精度对比,由于您进行了int8量化,因此会对精度造成了一定损失。
mlu:
6.890625000000000000e+00 -1.113281250000000000e+00 -5.964843750000000000e+00 -3.063964843750000000e-01 4.343750000000000000e+00 -2.101562500000000000e+00 9.674072265625000000e-02 -1.325195312500000000e+00 -1.450195312500000000e+00 2.435546875000000000e+00 -2.755859375000000000e+00 7.214843750000000000e+00 3.029296875000000000e+00 9.555664062500000000e-01 -5.512695312500000000e-01 2.328125000000000000e+00 9.604492187500000000e-01 6.394531250000000000e+00 -1.645507812500000000e-01 -4.324218750000000000e+0
cpu:
9.038786888122558594e+00 -2.188739538192749023e+00 -7.382113933563232422e+00 6.280529499053955078e-02 5.158084392547607422e+00 -2.028887271881103516e+00 -4.253607988357543945e-01 1.081809997558593750e-01 -1.120917081832885742e+00 2.655514478683471680e+00 -3.744379043579101562e+00 9.149456977844238281e+00 3.738803625106811523e+00 -1.252558469772338867e+00 -3.952614068984985352e-01 1.592808723449707031e+00 7.857535481452941895e-01 8.117773056030273438e+00 6.915171146392822266e-01 -5.211871147155761719e+00
您好,您可以尝试下把std设为1,看看结果是否符合预期。
这个试过了,cpu和mlu代码中统一设置:mean=[0,0,0], std=1
cpu结果:
[[0]] 10.884485
[[0]] 21.488785
[[0]] 33.21021
[[0]] 33.22379
[[0]] 29.092976
[[0]] 12.942553
[[0]] 28.53116
[[0]] 30.393711
[[0]] 15.588205
[[0]] 21.283323
[[0]] 30.011887
[[0]] 22.923153
[[0]] 17.473858
[[0]] 28.751379
[[0]] 29.601063
[[0]] 25.026934
[[0]] 14.760608
[[0]] 36.701244
[[0]] 34.12405
[[0]] 20.486364
[[0]] 12.991854
[[0]] 37.426823
[[0]] 25.84907
[[0]] 14.0447035
[[0]] 34.449768
[[0]] 35.63881
[[0]] 30.176765
mlu220结果(use_firstconv = 0):
0 6.960938
42 6.253906
0 9.195312
0 11.601562
14 6.359375
6 8.796875
6 11.460938
6 12.500000
6 10.992188
42 11.101562
42 10.562500
42 9.757812
42 9.882812
42 9.695312
42 9.523438
42 9.484375
6 10.445312
6 11.585938
6 11.789062
6 10.617188
6 10.382812
6 8.859375
0 8.421875
0 11.593750
0 13.820312
0 14.218750
0 13.960938
mlu220结果(use_firstconv = 1):
0 11.460938
0 15.757812
0 28.218750
0 12.945312
0 15.015625
0 9.125000
0 12.820312
0 10.773438
0 13.773438
0 11.515625
0 12.015625
0 11.617188
0 12.125000
0 14.234375
0 13.335938
0 14.804688
0 12.664062
0 11.710938
0 10.726562
0 13.757812
0 8.257812
0 11.718750
0 10.804688
0 14.070312
0 14.109375
0 14.484375
0 12.187500
这是板子上的sdk版本: CNRT: 4.7.12 03ea1d9
根据您的配置文件,您最终需要得到的应该是:= (img - [123.68, 116.78, 103.94]) / 0.0039215686展开
额。。。难道我对文档理解有误?这是官方文档(寒武纪Caffe用户手册-v5.3.1.pdf)对std的描述,由于没办法上传图片我就不截图了,直接粘贴过来:
std:缩小倍数,默认值为1。像mobilenet中的0.017,实际意义是让输入乘以0.017倍,所以设std为0.017。而对于有些方差是128的网络,实际意义是让输入乘以1/128倍,所以设std为0.0078125。如果量化网络使用frstconv层,就需要在生成的量化网络的第一层卷积处设置std参数。std支持分通道设置。
这样,cpu上预处理归一化那些都不要:img_file = '1.jpg'img = cv2.imread(img_file)img = cv2.resize(img, (192,64))img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)img = img.astype(np.float32)img = img.transpose([2, 0, 1])net.blobs['data'].data[0,:,:,:] = img转换设置use_firstconv = 0这样是否就能保证一致了?展开
是的,我们这边为了保持输入严格一致,关闭了use_firstconv,得到的mlu结果和cpu结果是一致的。
您的配置文件是:mean = 123.68, 116.78, 103.94std = 0.0039215686您的推理代码: mean = np.array([0.485, 0.456, 0.406], dtype=np.float32) std = np.array([1, 1, 1], dtype=np.float32) img = (img / 255.0 - mean) / std正确的情况您的std应该使用0.0039215686。展开
是这样的,我这里是移植pytorch的写法,表述上可能有点差异,我们推到下看看:
(img/255-[0.485, 0.456, 0.406])/[1,1,1]
= (img/255-[0.485, 0.456, 0.406])
= (img - [123.68, 116.78, 103.94])/255
= (img - [123.68, 116.78, 103.94])*0.0039215686
您好,在cpu测试时,您所使用的归一化公式“输入=(输入/方差) - 均值”并不正确,您需要使用公式“输入=(输入-均值)/方差”;此外,您在设置量化使用的方差是1/255,但您在计算时使用的方差是1。展开
这样,cpu上预处理归一化那些都不要:
img_file = '1.jpg'
img = cv2.imread(img_file)
img = cv2.resize(img, (192,64))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = img.astype(np.float32)
img = img.transpose([2, 0, 1])
net.blobs['data'].data[0,:,:,:] = img
转换设置use_firstconv = 0这样是否就能保证一致了?
您仔细看一下,cpu代码里是先归一化(1/255)再减均值除方差,计算结果是一样的。最后您说我在计算时使用的方差是1是指哪里?展开
mean = 123.68, 116.78, 103.94
std = 0.0039215686
您的推理代码:
mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
std = np.array([1, 1, 1], dtype=np.float32)
img = (img / 255.0 - mean) / std
正确的情况您的std应该使用0.0039215686。
请登录后评论