最近在搞华为 AI 加速器的适配
用的 ascend310 和 Atlas 300I 推理卡(型号:3010)
我在华为云开了一个 ascend310+ubuntu18.04 的实例
在 onnx 模型转成 om 模型的时候,遇到了一堆问题,各种算子不支持
然后,询问华为昇腾工程师之后,建议我升级 cann 的版本
https://gitee.com/ascend/modelzoo/issues/I7S5KS
我查看了新开的 ascend310+ubuntu18.04 的 cann 版本是商业版,但是是特别古老 20.1.rc1 版本
此时最新的商业版已经是 23.0.RC2
https://www.hiascend.com/zh/hardware/firmware-drivers/commerc...
我就得升级 cann 的版本
但是我不想升级商业版,因为需要特别的账号才能下载,太麻烦了,所以我就想着用社区版算了
社区版的下载地址:https://www.hiascend.com/software/cann/community
但是因为本机已经预装了 20.1.rc1,我不想覆盖,所以就得额外找个地方安装最新的社区版 7.0.RC1
那怎么指定 7.0.RC1 的安装路径呢?
可以使用 --install-path 参数指定
示例如下:
./Ascend-cann-toolkit_7.0.RC1.alpha001_linux-x86_64.run --install --install-path=/opt/Ascend-cann-toolkit_7.0.RC1
运行结果:
(samples) root@ascend310:~/code/samples# ./Ascend-cann-toolkit_7.0.RC1.alpha001_linux-x86_64.run --install --install-path=/opt/Ascend-cann-toolkit_7.0.RC1
Verifying archive integrity... 100% SHA256 checksums are OK. All good.
Uncompressing ASCEND_RUN_PACKAGE 100%
[Toolkit] [20230811-20:31:14] [INFO] LogFile:/var/log/ascend_seclog/ascend_toolkit_install.log
[Toolkit] [20230811-20:31:14] [INFO] install start
[Toolkit] [20230811-20:31:14] [INFO] The installation path is /opt/Ascend-cann-toolkit_7.0.RC1.
[Toolkit] [20230811-20:31:14] [ERROR] install failed:check driver compatibility failed.You can add --force to force install or upgrade the driver
[Toolkit] [20230811-20:31:14] [ERROR] check the environment failed
(samples) root@ascend310:~/code/samples# ll | grep log
(samples) root@ascend310:~/code/samples# ./Ascend-cann-toolkit_7.0.RC1.alpha001_linux-x86_64.run --install --install-path=/opt/Ascend-cann-toolkit_7.0.RC1 --force
Verifying archive integrity... 100% SHA256 checksums are OK. All good.
Uncompressing ASCEND_RUN_PACKAGE 100%
[Toolkit] [20230811-20:34:12] [INFO] LogFile:/var/log/ascend_seclog/ascend_toolkit_install.log
[Toolkit] [20230811-20:34:12] [INFO] install start
[Toolkit] [20230811-20:34:12] [INFO] The installation path is /opt/Ascend-cann-toolkit_7.0.RC1.
[Toolkit] [20230811-20:34:12] [INFO] install package CANN-runtime-7.0.RC1.alpha001-linux_x86_64.run start
[Toolkit] [20230811-20:34:21] [INFO] CANN-runtime-7.0.RC1.alpha001-linux_x86_64.run --full --quiet --nox11 install success
[Toolkit] [20230811-20:34:21] [INFO] install package CANN-compiler-7.0.RC1.alpha001-linux_x86_64.run start
[Toolkit] [20230811-20:34:58] [INFO] CANN-compiler-7.0.RC1.alpha001-linux_x86_64.run --full --pylocal --quiet --nox11 install success
[Toolkit] [20230811-20:34:58] [INFO] install package CANN-opp-7.0.RC1.alpha001-linux_x86_64.run start
[Toolkit] [20230811-20:35:38] [INFO] CANN-opp-7.0.RC1.alpha001-linux_x86_64.run --full --quiet --nox11 install success
[Toolkit] [20230811-20:35:38] [INFO] install package CANN-toolkit-7.0.RC1.alpha001-linux_x86_64.run start
[Toolkit] [20230811-20:36:26] [INFO] CANN-toolkit-7.0.RC1.alpha001-linux_x86_64.run --full --pylocal --quiet --nox11 install success
[Toolkit] [20230811-20:36:26] [INFO] install package CANN-aoe-7.0.RC1.alpha001-linux_x86_64.run start
[Toolkit] [20230811-20:36:29] [INFO] CANN-aoe-7.0.RC1.alpha001-linux_x86_64.run --full --quiet --nox11 install success
[Toolkit] [20230811-20:36:29] [INFO] install package Ascend-mindstudio-toolkit_7.0.RC1.alpha001_linux-x86_64.run start
[Toolkit] [20230811-20:36:35] [INFO] Ascend-mindstudio-toolkit_7.0.RC1.alpha001_linux-x86_64.run --full --quiet --nox11 install success
[Toolkit] [20230811-20:36:35] [INFO] install package Ascend-test-ops_7.0.RC1.alpha001_linux.run start
[Toolkit] [20230811-20:36:35] [INFO] Ascend-test-ops_7.0.RC1.alpha001_linux.run --full --quiet --nox11 install success
[Toolkit] [20230811-20:36:35] [INFO] install package Ascend-pyACL_7.0.RC1.alpha001_linux-x86_64.run start
[Toolkit] [20230811-20:36:35] [INFO] Ascend-pyACL_7.0.RC1.alpha001_linux-x86_64.run --full --quiet --nox11 install success
[Toolkit] [20230811-20:36:35] [INFO] install package CANN-ncs-7.0.RC1.alpha001-linux_x86_64.run start
[Toolkit] [20230811-20:36:37] [INFO] CANN-ncs-7.0.RC1.alpha001-linux_x86_64.run --full --quiet --nox11 install success
[Toolkit] [20230811-20:36:38] [INFO] The /etc/Ascend/ascend_cann_install.info is written successfully.
===========
= Summary =
===========
Driver: Not installed.
Toolkit: Ascend-cann-toolkit_7.0.RC1.alpha001_linux-x86_64 install success, installed in /opt/Ascend-cann-toolkit_7.0.RC1.
Please make sure that the environment variables have been configured.
- To take effect for all users, you can add "source /opt/Ascend-cann-toolkit_7.0.RC1/ascend-toolkit/set_env.sh" to /etc/profile.
- To take effect for current user, you can exec command below: source /opt/Ascend-cann-toolkit_7.0.RC1/ascend-toolkit/set_env.sh or add "source /opt/Ascend-cann-toolkit_7.0.RC1/ascend-toolkit/set_env.sh" to ~/.bashrc.
***WARNING***To ensure Toolkit's normal function, please check the driver installation manually.
(samples) root@ascend310:~/code/samples#
记得要使用新的 acnn 的话,如何设置环境变量
在安装路径下 /opt/Ascend-cann-toolkit_7.0.RC1/ascend-toolkit/7.0.RC1.alpha001/x86_64-linux/script/set_env.sh
就有
注意,我的安装路径是 /opt/Ascend-cann-toolkit_7.0.RC1/ascend-toolkit/7.0.RC1.alpha001
,如果你的路径是其他,那么我们的前缀是不一样的
内容如下:
export LD_LIBRARY_PATH=/usr/local/Ascend/driver/lib64:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:$LD_LIBRARY_PATH
export ASCEND_TOOLKIT_HOME=/opt/Ascend-cann-toolkit_7.0.RC1/ascend-toolkit/7.0.RC1.alpha001
export LD_LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/lib64:${ASCEND_TOOLKIT_HOME}/lib64/plugin/opskernel:${ASCEND_TOOLKIT_HOME}/lib64/plugin/nnengine:$LD_LIBRARY_PATH
export PYTHONPATH=${ASCEND_TOOLKIT_HOME}/python/site-packages:${ASCEND_TOOLKIT_HOME}/opp/built-in/op_impl/ai_core/tbe:$PYTHONPATH
export PATH=${ASCEND_TOOLKIT_HOME}/bin:${ASCEND_TOOLKIT_HOME}/compiler/ccec_compiler/bin:$PATH
export ASCEND_AICPU_PATH=${ASCEND_TOOLKIT_HOME}
export ASCEND_OPP_PATH=${ASCEND_TOOLKIT_HOME}/opp
export TOOLCHAIN_HOME=${ASCEND_TOOLKIT_HOME}/toolkit
export ASCEND_HOME_PATH=${ASCEND_TOOLKIT_HOME}
然后上面的环境变量,复制粘贴回车,此时你的终端就可以使用新的 acnn 了
注意,以后每开启一个新的终端,都需要这样操作
华为昇腾 ascend310 推理遇到报错:
Exception: acl.rt.set_device failed
完整报错
(samples) root@ascend310:~/code/ascend_example# bash ./scripts/sample_run.sh
/root/code/ascend_example/scripts
[INFO] The out directory is already there
./scripts/sample_run.sh: line 9: cd: /root/code/ascend_example/scripts/../src: No such file or directory
[INFO] The sample starts to run
[INFO] init resource stage:
Traceback (most recent call last):
File "sampleYOLOV7NMSONNX.py", line 147, in <module>
net.init_resource()
File "sampleYOLOV7NMSONNX.py", line 37, in init_resource
self.resource.init()
File "/root/code/ascend_example/acllite_resource.py", line 83, in init
utils.check_ret("acl.rt.set_device", ret)
File "/root/code/ascend_example/acllite_utils.py", line 18, in check_ret
.format(message, ret_int))
Exception: acl.rt.set_device failed ret_int=507033
[INFO] acl resource release all resource
[INFO] Reset acl device 0
./scripts/sample_run.sh: line 11: 26763 Segmentation fault (core dumped) python3.7 sampleYOLOV7NMSONNX.py
[INFO] The program runs failed
我看的教程是这个:https://gitee.com/ascend/samples/tree/master/inference/modelI...
手把手教你把 onnx 转成 om,然后调用 python+acl 调用 om 推理
上面的问题要怎么解决?
看到这篇 QA,https://gitee.com/ascend/samples/issues/I4N5SF
里面提到,但我们升级了 cann 的时候,因为升级幅度比较大,所以驱动也要一起升级
好吧,让我们一起来升级驱动吧!
驱动下载地址:https://www.hiascend.com/zh/hardware/firmware-drivers/communi...
至于下载哪个驱动版本?我不知道,我只知道我用的 cann 是最新,那么驱动也用最新肯定没错
root@ascend310:~/code/image2vector# npu-smi info
+--------------------------------------------------------------------------------------------------------+
| npu-smi 22.0.4 Version: 22.0.4 |
+-------------------------------+-----------------+------------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page) |
| Chip Device | Bus-Id | AICore(%) Memory-Usage(MB) |
+===============================+=================+======================================================+
| 13 310 | OK | 12.8 44 0 / 969 |
| 0 0 | 0000:00:0D.0 | 0 625 / 7759 |
+===============================+=================+======================================================+
| 14 310 | OK | 12.8 44 0 / 969 |
| 0 1 | 0000:00:0E.0 | 0 624 / 7759 |
+===============================+=================+======================================================+
华为的 xx 就是一坨 xx
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。