×

签到

分享到微信

打开微信,使用扫一扫进入页面后,点击右上角菜单,

点击“发送给朋友”或“分享到朋友圈”完成分享

实验1测试时os.environ[&#039MLU_VISIBLE_DEVICES&#039]与多核的问题 nicholaswilde2020-06-16 05:12:36 回复 5 查看 实验支持
实验1测试时os.environ[&#039MLU_VISIBLE_DEVICES&#039]与多核的问题
分享到:

在实验一最后的tensorflow算子集成测试mlu时,我循环了100次并分别记录了每次运行的时间,遇到很奇怪的现象:


os.environ['MLU_VISIBLE_DEVICES'] = “0”,单核测试,每次运行时间:[121, 47, 46, 37, 37, 46, 53, 42, 43, 50, 45, 42, 43, 44, 47, 49, 48, 50, 50, 50, 49, 66, 53, 50, 50, 54, 55, 52, 54, 57, 60, 68, 56, 63, 69, 59, 66, 59, 66, 63, 67, 65, 64, 62, 63, 65, 70, 74, 66, 67, 82, 78, 83, 81, 78, 81, 83, 75, 76, 76, 80, 85, 91, 83, 85, 82, 94, 87, 88, 92, 92, 95, 91, 92, 98, 93, 93, 140, 100, 98, 101, 96, 103, 99, 99, 106, 105, 100, 102, 104, 103, 102, 106, 108, 105, 109, 106, 105, 110, 105]


os.environ['MLU_VISIBLE_DEVICES'] = “0”,通过设置config.mlu_options.core_num = 16进行多核测试,多次尝试均报错:[cnrtError] [95362] [Card : 0] MLU unfinished. cnrtStream fail.


os.environ['MLU_VISIBLE_DEVICES'] = “1”(或者2、3、4、...),单核测试,每次运行时间:[79, 8, 8, 9, 9, 10, 10, 11, 12, 12, 14, 14, 15, 15, 16, 17, 17, 18, 19, 19, 20, 21, 21, 22, 22, 23, 23, 24, 25, 25, 26, 27, 34, 33, 29, 30, 30, 30, 31, 32, 32, 33, 33, 34, 35, 36, 36, 37, 37, 38, 39, 39, 40, 41, 41, 42, 43, 43, 44, 44, 45, 46, 47, 47, 47, 48, 48, 49, 49, 50, 51, 53, 53, 53, 54, 54, 55, 56, 56, 57, 57, 58, 59, 60, 60, 60, 61, 62, 62, 62, 63, 64, 65, 66, 66, 69, 69, 68, 69, 69]


os.environ['MLU_VISIBLE_DEVICES'] = “1”(或者2、3、4、...),通过设置config.mlu_options.core_num = 16进行多核测试,运行成功,每次运行时间:[73, 8, 8, 8, 9, 9, 10, 11, 11, 12, 13, 13, 14, 15, 16, 16, 17, 17, 18, 18, 19, 20, 21, 21, 22, 22, 23, 24, 24, 25, 25, 26, 27, 28, 30, 31, 30, 30, 33, 32, 32, 33, 34, 34, 35, 35, 36, 36, 37, 38, 38, 39, 39, 40, 41, 42, 42, 43, 44, 70, 70, 63, 87, 87, 82, 57, 48, 48, 49, 50, 50, 51, 51, 52, 53, 53, 54, 55, 55, 78, 101, 62, 58, 59, 60, 61, 61, 61, 64, 64, 72, 112, 66, 66, 67, 68, 68, 69, 69, 70]


通过这些实验结果,我产生了四个问题:


  1. 为什么每项测试的100次运行时间中,除第一次因为运行图优化耗时较长外,后面的99次的运行时间呈现递增的趋势,且运行时间差距非常非常大?
  2. 为什么os.environ['MLU_VISIBLE_DEVICES'] = “1”(或者2、3、4、...)跑出来的单核实验结果要比os.environ['MLU_VISIBLE_DEVICES'] = “0”的结果好很多?

  3. 为什么os.environ['MLU_VISIBLE_DEVICES'] = “0”时多核测试总是运行失败?

  4. 为什么os.environ['MLU_VISIBLE_DEVICES'] = “1”(或者2、3、4、...)的单核实验结果与多核试验结果几乎完全相同,是因为我config = tf.ConfigProto(); config.mlu_options.core_num = 16with tf.Session(config = config) as sess:这样实现不对没有真正利用多核的性能吗?

版权所有 © 2024 寒武纪 Cambricon.com 备案/许可证号:京ICP备17003415号-1
关闭