最近在搞华为 AI 加速器的适配

用的 ascend310 和 Atlas 300I 推理卡(型号:3010)

我在华为云开了一个 ascend310+ubuntu18.04 的实例

在 onnx 模型转成 om 模型的时候,遇到了一堆问题,各种算子不支持

然后,询问华为昇腾工程师之后,建议我升级 cann 的版本

https://gitee.com/ascend/modelzoo/issues/I7S5KS

我查看了新开的 ascend310+ubuntu18.04 的 cann 版本是商业版,但是是特别古老 20.1.rc1 版本

此时最新的商业版已经是 23.0.RC2

https://www.hiascend.com/zh/hardware/firmware-drivers/commerc...

我就得升级 cann 的版本

但是我不想升级商业版,因为需要特别的账号才能下载,太麻烦了,所以我就想着用社区版算了

图片.png

社区版的下载地址:https://www.hiascend.com/software/cann/community

但是因为本机已经预装了 20.1.rc1,我不想覆盖,所以就得额外找个地方安装最新的社区版 7.0.RC1

那怎么指定 7.0.RC1 的安装路径呢?

可以使用 --install-path 参数指定

示例如下:

./Ascend-cann-toolkit_7.0.RC1.alpha001_linux-x86_64.run --install --install-path=/opt/Ascend-cann-toolkit_7.0.RC1

运行结果:

(samples) root@ascend310:~/code/samples# ./Ascend-cann-toolkit_7.0.RC1.alpha001_linux-x86_64.run --install --install-path=/opt/Ascend-cann-toolkit_7.0.RC1
Verifying archive integrity...  100%   SHA256 checksums are OK. All good.
Uncompressing ASCEND_RUN_PACKAGE  100%  
[Toolkit] [20230811-20:31:14] [INFO] LogFile:/var/log/ascend_seclog/ascend_toolkit_install.log
[Toolkit] [20230811-20:31:14] [INFO] install start
[Toolkit] [20230811-20:31:14] [INFO] The installation path is /opt/Ascend-cann-toolkit_7.0.RC1.
[Toolkit] [20230811-20:31:14] [ERROR] install failed:check driver compatibility failed.You can add --force to force install or upgrade the driver
[Toolkit] [20230811-20:31:14] [ERROR] check the environment failed
(samples) root@ascend310:~/code/samples# ll | grep log
(samples) root@ascend310:~/code/samples# ./Ascend-cann-toolkit_7.0.RC1.alpha001_linux-x86_64.run --install --install-path=/opt/Ascend-cann-toolkit_7.0.RC1 --force
Verifying archive integrity...  100%   SHA256 checksums are OK. All good.
Uncompressing ASCEND_RUN_PACKAGE  100%  
[Toolkit] [20230811-20:34:12] [INFO] LogFile:/var/log/ascend_seclog/ascend_toolkit_install.log
[Toolkit] [20230811-20:34:12] [INFO] install start
[Toolkit] [20230811-20:34:12] [INFO] The installation path is /opt/Ascend-cann-toolkit_7.0.RC1.
[Toolkit] [20230811-20:34:12] [INFO] install package CANN-runtime-7.0.RC1.alpha001-linux_x86_64.run start
[Toolkit] [20230811-20:34:21] [INFO] CANN-runtime-7.0.RC1.alpha001-linux_x86_64.run --full --quiet --nox11 install success
[Toolkit] [20230811-20:34:21] [INFO] install package CANN-compiler-7.0.RC1.alpha001-linux_x86_64.run start
[Toolkit] [20230811-20:34:58] [INFO] CANN-compiler-7.0.RC1.alpha001-linux_x86_64.run --full --pylocal --quiet --nox11 install success
[Toolkit] [20230811-20:34:58] [INFO] install package CANN-opp-7.0.RC1.alpha001-linux_x86_64.run start
[Toolkit] [20230811-20:35:38] [INFO] CANN-opp-7.0.RC1.alpha001-linux_x86_64.run --full --quiet --nox11 install success
[Toolkit] [20230811-20:35:38] [INFO] install package CANN-toolkit-7.0.RC1.alpha001-linux_x86_64.run start
[Toolkit] [20230811-20:36:26] [INFO] CANN-toolkit-7.0.RC1.alpha001-linux_x86_64.run --full --pylocal --quiet --nox11 install success
[Toolkit] [20230811-20:36:26] [INFO] install package CANN-aoe-7.0.RC1.alpha001-linux_x86_64.run start
[Toolkit] [20230811-20:36:29] [INFO] CANN-aoe-7.0.RC1.alpha001-linux_x86_64.run --full --quiet --nox11 install success
[Toolkit] [20230811-20:36:29] [INFO] install package Ascend-mindstudio-toolkit_7.0.RC1.alpha001_linux-x86_64.run start
[Toolkit] [20230811-20:36:35] [INFO] Ascend-mindstudio-toolkit_7.0.RC1.alpha001_linux-x86_64.run --full --quiet --nox11 install success
[Toolkit] [20230811-20:36:35] [INFO] install package Ascend-test-ops_7.0.RC1.alpha001_linux.run start
[Toolkit] [20230811-20:36:35] [INFO] Ascend-test-ops_7.0.RC1.alpha001_linux.run --full --quiet --nox11 install success
[Toolkit] [20230811-20:36:35] [INFO] install package Ascend-pyACL_7.0.RC1.alpha001_linux-x86_64.run start
[Toolkit] [20230811-20:36:35] [INFO] Ascend-pyACL_7.0.RC1.alpha001_linux-x86_64.run --full --quiet --nox11 install success
[Toolkit] [20230811-20:36:35] [INFO] install package CANN-ncs-7.0.RC1.alpha001-linux_x86_64.run start
[Toolkit] [20230811-20:36:37] [INFO] CANN-ncs-7.0.RC1.alpha001-linux_x86_64.run --full --quiet --nox11 install success
[Toolkit] [20230811-20:36:38] [INFO] The /etc/Ascend/ascend_cann_install.info is written successfully.


===========
= Summary =
===========

Driver:   Not installed.
Toolkit:  Ascend-cann-toolkit_7.0.RC1.alpha001_linux-x86_64 install success, installed in /opt/Ascend-cann-toolkit_7.0.RC1.

Please make sure that the environment variables have been configured.
-  To take effect for all users, you can add "source /opt/Ascend-cann-toolkit_7.0.RC1/ascend-toolkit/set_env.sh" to /etc/profile.
-  To take effect for current user, you can exec command below: source /opt/Ascend-cann-toolkit_7.0.RC1/ascend-toolkit/set_env.sh or add "source /opt/Ascend-cann-toolkit_7.0.RC1/ascend-toolkit/set_env.sh" to ~/.bashrc.

***WARNING***To ensure Toolkit's normal function, please check the driver installation manually.

(samples) root@ascend310:~/code/samples# 

记得要使用新的 acnn 的话,如何设置环境变量

在安装路径下 /opt/Ascend-cann-toolkit_7.0.RC1/ascend-toolkit/7.0.RC1.alpha001/x86_64-linux/script/set_env.sh 就有

注意,我的安装路径是 /opt/Ascend-cann-toolkit_7.0.RC1/ascend-toolkit/7.0.RC1.alpha001,如果你的路径是其他,那么我们的前缀是不一样的

内容如下:

export LD_LIBRARY_PATH=/usr/local/Ascend/driver/lib64:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:$LD_LIBRARY_PATH
export ASCEND_TOOLKIT_HOME=/opt/Ascend-cann-toolkit_7.0.RC1/ascend-toolkit/7.0.RC1.alpha001
export LD_LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/lib64:${ASCEND_TOOLKIT_HOME}/lib64/plugin/opskernel:${ASCEND_TOOLKIT_HOME}/lib64/plugin/nnengine:$LD_LIBRARY_PATH
export PYTHONPATH=${ASCEND_TOOLKIT_HOME}/python/site-packages:${ASCEND_TOOLKIT_HOME}/opp/built-in/op_impl/ai_core/tbe:$PYTHONPATH
export PATH=${ASCEND_TOOLKIT_HOME}/bin:${ASCEND_TOOLKIT_HOME}/compiler/ccec_compiler/bin:$PATH
export ASCEND_AICPU_PATH=${ASCEND_TOOLKIT_HOME}
export ASCEND_OPP_PATH=${ASCEND_TOOLKIT_HOME}/opp
export TOOLCHAIN_HOME=${ASCEND_TOOLKIT_HOME}/toolkit
export ASCEND_HOME_PATH=${ASCEND_TOOLKIT_HOME}

然后上面的环境变量,复制粘贴回车,此时你的终端就可以使用新的 acnn 了

注意,以后每开启一个新的终端,都需要这样操作


华为昇腾 ascend310 推理遇到报错:

Exception: acl.rt.set_device failed

完整报错

(samples) root@ascend310:~/code/ascend_example# bash ./scripts/sample_run.sh 
/root/code/ascend_example/scripts
[INFO] The out directory is already there
./scripts/sample_run.sh: line 9: cd: /root/code/ascend_example/scripts/../src: No such file or directory
[INFO] The sample starts to run


[INFO]  init resource stage:
Traceback (most recent call last):
  File "sampleYOLOV7NMSONNX.py", line 147, in <module>
    net.init_resource()
  File "sampleYOLOV7NMSONNX.py", line 37, in init_resource
    self.resource.init()
  File "/root/code/ascend_example/acllite_resource.py", line 83, in init
    utils.check_ret("acl.rt.set_device", ret)
  File "/root/code/ascend_example/acllite_utils.py", line 18, in check_ret
    .format(message, ret_int))
Exception: acl.rt.set_device failed ret_int=507033
[INFO]  acl resource release all resource
[INFO]  Reset acl device 0
./scripts/sample_run.sh: line 11: 26763 Segmentation fault      (core dumped) python3.7 sampleYOLOV7NMSONNX.py
[INFO] The program runs failed
我看的教程是这个:https://gitee.com/ascend/samples/tree/master/inference/modelI...
手把手教你把 onnx 转成 om,然后调用 python+acl 调用 om 推理

上面的问题要怎么解决?

看到这篇 QA,https://gitee.com/ascend/samples/issues/I4N5SF

里面提到,但我们升级了 cann 的时候,因为升级幅度比较大,所以驱动也要一起升级

好吧,让我们一起来升级驱动吧!

驱动下载地址:https://www.hiascend.com/zh/hardware/firmware-drivers/communi...

至于下载哪个驱动版本?我不知道,我只知道我用的 cann 是最新,那么驱动也用最新肯定没错

图片.png

图片.png

root@ascend310:~/code/image2vector# npu-smi info
+--------------------------------------------------------------------------------------------------------+
| npu-smi 22.0.4                                   Version: 22.0.4                                       |
+-------------------------------+-----------------+------------------------------------------------------+
| NPU     Name                  | Health          | Power(W)     Temp(C)           Hugepages-Usage(page) |
| Chip    Device                | Bus-Id          | AICore(%)    Memory-Usage(MB)                        |
+===============================+=================+======================================================+
| 13      310                   | OK              | 12.8         44                0    / 969            |
| 0       0                     | 0000:00:0D.0    | 0            625  / 7759                             |
+===============================+=================+======================================================+
| 14      310                   | OK              | 12.8         44                0    / 969            |
| 0       1                     | 0000:00:0E.0    | 0            624  / 7759                             |
+===============================+=================+======================================================+

华为的 xx 就是一坨 xx


universe_king
3.4k 声望680 粉丝