GCC 15 持续改进 AArch64 - SegmentFault 思否

GCC 15 for Arm includes various improvements and features:

Code generation innovation: Continues to support control flow vectorization, mixes SVE and Adv. SIMD instructions, and improves basic things like addressing modes and constant creations. It also has an estimated 3-5% improvement in SPECCPU 2017 Intrate on Neoverse CPUs.
- Vectorization improvements: Only the loop-aware SLP vectorizer is enabled for easier extension. Early break SLP support fixes liveness issues and enables easier implementation of new features. Peeling for alignment works for early breaks with or without static buffer size knowledge. Fixed Size SVE Peeling is also available. Support vectorization of loops with __builtin_prefetch calls is added, and prefetch calls are dropped during vectorization. Two way dot product support is added with SVE2.1.
- Libmvec support for SVE: Auto-vectorizing math routines using glibc’s libmvec implementation is supported, and it extends to user-defined functions.
- Conditional store optimizations: The vectorizer can now optimize conditional stores through better predicate usage.
- Saturating arithmetic support: Supports detection and usage of saturating instructions as scalar and vector instructions.
- Improve vectorization of popcount: Better support for popcount operation with various optimizations.
- MVE Tail predication: 32-bit Arm now supports loop tail predication to avoid scalar tail for vector loops.
- FP8: Supports FP8 floating-point format with new system register FPMR and intrinsics. GCC 15 also includes a new pass to optimize FPMR usage.
- RCPC3 (FEAT_LRCPC3) Libatomic: Supports RCPC3 atomic instructions in libatomic.
- New cores and updated Neoverse tunings: Adds support for new CPUs and updates Neoverse cost models.
- Default L1 data cache line change: Changes the default cache line size to 64-bytes for all Neoverse cores and Armv9-a.
- Pipelined FMA code generation: Updates Neoverse cost models with pipelined Fused Multiple and Accumulated support.
- CMP+CSEL fusion: Adds support for new instruction fusions in Neoverse cores.
- New Architecture and features support: Supports various Arm architectures and features.
- SVE Support for C/C++ operators: Allows using standard C++ overloaded operators with SVE ACLE types.
- (New and Improved) SVE/OpenMP interoperability support: Better support for SVE and OpenMP offloading.
- SVE2.1 & SME2.1: Supports SVE2.1 and SME2.1 extensions.
- IV opts/Addressing modes: Optimizes addressing modes based on target support and cost.
- ILP32 deprecation: Deprecates support for ILP32 ABI in GCC 15.
- -mcpu=native detection on unknown heterogeneous systems: Detects the current platform and generates optimized code.
- Libstdc++ improvement: Improves performance of code generated from libstdc++ standard library, such as in std::find and std::hashtable.
- Suppress default Cortex-A53 erratum: Suppresses Cortex-A53 erratum fixes when not applicable.
- Improvement to SIMD immediate generations: Extends immediate generation for Adv. SIMD using SVE instructions.
- Disconnect immediate mode from container/type: Allows using any instruction to create vector constants.
- Improve constant generation with SVE: Uses SVE's index instruction to generate regular sequences of numbers.
- Update zero initialization: Creates zeros using the same base mode and SVE instructions.
- Permute optimizations: Performs various optimizations on permutes, such as optimizing zero registers and avoiding unnecessary permutes.
- Disable early scheduling for lower optimization levels: Disables the early scheduler on AArch64 for lower optimization levels.
- SVE Intrinsics optimizations: Performs optimizations on SVE intrinsics.
- Improve CTZ optimizations: Optimizes CTZ for SVE using RBIT.
- Improve vector rotates: Implements vector rotate operations using suitable instructions.
- Improve CRC detections: Automatically detects CRC sequences and emits optimized instructions.
- Use SVE FSCALE, ASRD and more with Adv. SIMD: Uses SVE instructions to patch up inefficiencies in Adv. SIMD codegen.
- Code locality optimizations: Optimizes code layout for locality between callees and callers with LTO.
- Improved malloc in glibc: Splits __libc_malloc into two parts for performance gains.
- Guarded Control Stack (GCS): Brings support for GCS extension for systems with specific hardware and Linux kernel versions. It is opt-in and needs to be explicitly activated at runtime.

Overall, GCC 15 is a significant step forward in Arm support and optimization.