The magic weapon for training hundreds of billions of parameter models, Shengteng CANN heterogeneous computing architecture is here~

Abstract: is based on the "Shengteng AI processor", and at the same time, with the help of the "CANN heterogeneous computing architecture", the hardware computing power can be fully released, greatly shortening the training time!

In April 22021, the "Huawei Cloud Pangu Large Model" became a hit in the field of AI artificial intelligence.

If you ask: He clearly likes him for nothing, but he just doesn't say, who does he like for nothing?

Your companion may hesitate for 3 seconds, but Pangu can easily answer: Mingming!

Such fast semantic recognition of "the same word in Chinese with different meanings" is only its trick.

With leading language understanding and model generation capabilities, this big influencer was instantly labeled "closest to human Chinese understanding" and "the world's largest Chinese language (NLP) pre-training model".

The label is not a blank post. In the AI field, great wisdom means a big model. The "hundred billions of parameters" and "TB-level memory model" behind him are definitely his magic weapon for success!

A large model means that the data is also large. Have you thought of how to train such a large model?

Pangu's training is based on the "Shengteng AI processor", and at the same time, with the help of the "CANN heterogeneous computing architecture", the hardware computing power can be fully released, greatly shortening the training time!

What is CANN?

With the goal of improving user development efficiency and releasing the ultimate computing power of Shengteng AI processors, Shengteng CANN (Compute Architecture for Neural Networks) is a heterogeneous computing architecture launched by Huawei for AI scenarios. It supports the industry's mainstream front-end frameworks, shields users from the hardware differences of serialized chips, and uses rich software stack functions to meet users' demands for artificial intelligence applications in all scenarios.

Currently CANN has been released to version 3.0, unified programming architecture, and supports all scenarios of end, edge, and cloud for reasoning + training, and achieves three enablements.

enables all scenarios: supports 14+ mainstream operating systems by supporting mainstream AI frameworks in the industry, enabling one-time development, flexible deployment of various hardware forms and operating environments in all scenarios.

enables minimal development: uses a unified programming interface AscendCL (Ascend Computing Language) to shield developers from the differences in the underlying processors. Developers only need to master a set of APIs, which can be fully applied to Ascend's full range of chips+ Reasoning and training the whole scene.

enables extreme performance: through software-hardware collaborative optimization, affinity-friendly graph compilation technology, and more than 1200 high-performance operators.

CANN's open capabilities:

CANN provides developers with a full-process development experience of operator development, model development, and application development, which can cover all-scene applications.
• Operator development
• DSL language development interface: a set of memory-based development interface is provided, and instruction mapping and scheduling on the processor are automatically realized. Developers only need to pay attention to the mathematical logic calculation of the operator, and do not need to understand the hardware details to develop high-performance operators. According to statistics, more than 60% of operator development needs can be met.
• TIK language development interface: Provides a relatively complete set of programming languages that are visible based on the internal buffer of the processor. Developers can decide on their own the amount of data to be moved in and out, so as to give full play to the capabilities of the chip and improve the performance of the development operator.
• Model development
• Support multiple model development frameworks MindSpore, TensorFlow, PyTorch, ONNX, etc.
• Support to isolate the upper-layer framework differences through the standardized Ascend IR (Intermediate Representation) interface for direct composition and model development
• Application development
• Provide a set of standard AscendCL programming interface to improve user APP programming efficiency

CANN's hard core technology:

• high-performance operator library: supports 1200+ operators including TensorFlow, Pytorch, Mindspore, and Onnx frameworks. Developers can develop models directly based on the built-in operators.
• automatic fusion technology: supports multi-dimensional automatic fusion based on operators, subgraphs, and SCOPE, and supports dynamic DSL fusion, which can effectively reduce computing nodes, shorten computing time, and accelerate Ascend's AI processor instantly.

• heterogeneous deployment scheduling framework: makes full use of the heterogeneous execution units of the Ascend chip to allocate different computing tasks to the most suitable computing engine, efficiently coordinate asynchronous pipelines, and improve the overall efficiency of computing tasks.
• efficient memory life cycle management algorithm: takes into account the full reuse of memory and the efficiency of data exchange, achieving a balance of resources and efficiency.
• preset industry mainstream model library: Huawei Ascend Model Zoo provides 100+ mainstream model codes and corresponding tuning parameter examples, providing developers with shelf-style reference implementations. For more information, see: https:// www.hiascend.com/software/modelzoo
• high-performance graph sinking execution framework: sinks all calculations to the chip, reducing the interaction time between the Host CPU and the chip, and achieving high-performance training and inference.
• high-performance dynamic graph scheduling: supports a single operator execution framework based on asynchronous pipeline, supports flexible H2D, D2H interaction, and solves the high-performance operation problem of dynamic graph mode under frameworks such as PyTorch.
• industry-leading intelligent tuning: supports multiple intelligent tuning algorithms based on reinforcement learning, genetic algorithm, CostModel, etc., provides operator-level or graph-level tuning options, and provides users with an automatic ultimate performance tuning experience.

CANN 5.0 version will give you more room for imagination, for more information, visit leapt community .

Click to follow and learn about Huawei Cloud's fresh technology for the first time~

The magic weapon for training hundreds of billions of parameter models, Shengteng CANN heterogeneous computing architecture is here~

What is CANN?

CANN's open capabilities:

CANN's hard core technology:

华为云开发者联盟

引用和评论

华为云开发者联盟入选 2023 中国技术品牌影响力企业榜，深耕开发者生态

【Hadoop】HDFS架构解析

【Hadoop】HBase系统解析及适用场景

30分钟内输出结果，新加坡国立大学/MIT等基于SVM构建微生物污染检测模型

入选AAAI 2025，浙江大学提出多对一回归模型M2OST，利用数字病理图像精准预测基因表达

基于 pyflink 的算法工作流设计和改造

LLM增强语义嵌入的模型算法综述