attention_ocr的tensorflow版本移植bug - 寒武纪软件开发平台

attention_ocr的tensorflow版本移植bug 已解决 雕刻时光2023-06-14 16:39:59 回复 2 查看 技术答疑 使用求助 社区交流

0 赞 0 收藏
分享到:

docker镜像版本：yellow.hub.cambricon.com/tensorflow/tensorflow:1.0.2-devel-mlu-ubuntu16.04-py3.7

部署平台：MLU220

问题内容：可以获得量化模型，在量化之前的pb模型能正常推理结果，但是在pb_to_cambricon的过程报错

2023-06-14 16:10:09.369133: F tensorflow/tools/mlu_model_generation/mlu_model_generation.cc:405][30888] Check failed: status.ok() Invalid argument: 2 root error(s) found.
(0) Invalid argument: Expected multiples[0] >= 0, but got -1744830168
   [[{{node Tile_44}}]]
(1) Invalid argument: Expected multiples[0] >= 0, but got -1744830168
   [[{{node Tile_44}}]]
   [[Tile_5/multiples/_159]]
0 successful operations.
0 derived errors ignored.
Aborted (core dumped)

初步推测是pb模型的输入数据定义为

tf.placeholder(tf.float32,shape=(None,max_image_height, max_image_width,channels), name='img_data')

由于batch_size没有指定导致tf.tile算子中num_images的数值异常，从而导致模型转换失败

tf.tile([0], [num_images])

经netron可视化的输入如下：

可以看到type为Placeholder，且shape=[None,32,172,3]

我重新转换pb模型时指定batch_size数值时(只改动了这一项)

输入的类型边了，且有一个数据流直接中断(就是和tf.tile算子连接的数据流)，虽然没报错，但是移植后的模型非常小，显然是因为节点中断导致一些网络权重没移植过去，经推理后对输入输出信息打印明显异常：

input data shape
total size = 10
n(batchsize) = 1 h = 10 w = 1 c = 1

input data shape
total size = 512
n(batchsize) = 1 h = 512 w = 1 c = 1

input data shape
total size = 512
n(batchsize) = 1 h = 512 w = 1 c = 1

input data shape
total size = 22016
n(batchsize) = 1 h = 43 w = 1 c = 512

input data shape
total size = 22016
n(batchsize) = 1 h = 43 w = 1 c = 512

output data shape
total size = 98
n(batchsize) = 1 h = 98 w = 1 c = 1