编译器是(太)聪明的

  • 6/Jun 2024 by admin: Last week noticed unexpected assembly code in a function calling std::unordered_map::find on a Clang compiler, ARM-based platform. A minimal example using std::unordered_map is shown. The generated assembly has a giant blob at the beginning.
  • The strange code involves bit manipulation with numbers like 0x5555555555555555, 0x3333333333333333, 0xf0f0f0f0f0f0f0f, and 0x101010101010101, which basically count the number of set bits in a word.
  • find calls constrain_hash multiple times to convert a potentially big hash to a 0->bucket count range and avoid division by using an AND if the bucket count is a power-of-two.
  • Clang decided to use this approach as some CPUs have an instruction to count set bits. Comparing with -msse4 shows the use of popcnt instruction.
  • An example change to constrain_hash shows that undoing part of it stops Clang from using the popcnt version. A minimal repro without including unordered_map shows that constrain_hash in isolation does only the DEC/TEST version and only find calls it multiple times to get fancy.
  • Compiler Explorer snippet can be used to experiment. It has a feature to show both LLVM IR and the Optimization Pipeline. The optimization process starts with certain operations and then changes during different passes. constrain_hash in separation is implemented differently in different stages, and find inlines constrain_hash in the ctpop form. In some cases, native popcnt may be used wrongly if compiling for an unsupported architecture.
阅读 10
0 条评论