【Translation】Optimization of "empty member" in C++ class

Article from The "Empty Member" C++ Optimization . Here is a link I came across while looking at the c++ std::string code, which explains why _Alloc_hider takes an inherit from Alloc.
The article should be from 1997, so the length of the pointer inside is still 4 bytes.

Optimization of "empty members" in C++ classes

There are many useful templates in the C++ standard library, including the well-known SGI STL . The implementation of these templates is efficient and flexible. In daily programming, these templates can be used as examples to learn, and can also inspire us how to design programs that take into account flexibility and efficiency.

The optimization of "empty members" is such a model: a class without class members should not occupy memory space. When would you need a class without class members? Such a class generally has a series of typedefs or member functions, and the caller of the program can use a similar class defined by itself to complete some special functions (a custom class may not necessarily have no class members). The default class provided should meet most needs. In this case, optimizing the class with this empty member is a very cost-effective thing.

Due to language limitations (explained later), classes with empty members usually occupy a certain amount of memory space. If it is a general situation, it is fine, but in stl, if it is not optimized, it will still discourage many potential users.

Empty members "inflate"

Take STL as an example. Each STL container has an allocator parameter. When the container needs memory, it will apply to the allocator. If the user wants to customize the memory application process, he can provide his own allocator when constructing the container. In most cases, the container uses the STL default allocator, which directly calls new to complete the allocation. This is an empty class, similar to the following definition

 template <class T>
    class allocator {   // an empty class
      . . .
      static T* allocate(size_t n)
        { return (T*) ::operator new(n * sizeof T); }
      . . .
    };

As an example of list, the class list holds a private allocator member, which is assigned in the constructor

 template <class T, class Alloc = allocator<T> >
    class list {
      . . .
      Alloc alloc_; 
      struct Node { . . . };
      Node* head_;      

     public:
      explicit list(Alloc const& a = Alloc())
        : alloc_(a) { . . . }
      . . .
    };

The member list<>::alloc_ usually occupies 4 bytes, although this Alloc is an empty class. This is usually not a problem. But in case the list itself is a node of a huge data structure (such as vector<list>), when the vector is very large, this extra space consumption cannot be ignored. Huge memory footprint means slower execution. Even now, memory access is very slow relative to the frequency of the CPU itself.

empty object

So how to solve this problem? Before solving the problem, we first need to figure out why there is this layer of overhead. The language definition of C++ says this:

A class with an empty sequence of members and base class objects is an empty class. Complete objects and member subobjects of an empty class type shall have nonzero size.
Empty class: no data members, and no base class. The size of the complete object instantiated from this base class should not be 0.

Explain the reason for this requirement:

 struct Bar { };
  struct Foo {
    struct Bar a[2];
    struct Bar b;
  };
  Foo f;

So what are f.b and f.a[] respectively? If sizeof(Bar) is 0, then the 2 addresses are the same. If you use the address as the object identifier, then f.b and f.a[0] are the same object. The C++ standards committee solved this problem by forbidding the object size of empty classes to be 0.

But why does it need to occupy the size of 4 bytes? Although most compilers think sizeof(Bar) == 1 , 4 bytes is an object alignment requirement. for example:

 struct Baz {
    Bar b;
    int* p;
  };

The structure Baz is 8 bytes in size on most architectures. The compiler adds padding after Baz::b so that Baz::p does not span a word (word ).

 struct Baz
  +-----------------------------------+
  | +-------+-------+-------+-------+ |
  | | Bar b | XXXXX | XXXXX | XXXXX | |
  | +-------+-------+-------+-------+ |
  | +-------------------------------+ |
  | | int* p                        | |
  | +-------------------------------+ |
  +-----------------------------------+

So how to avoid this extra overhead? The C++ standard is also mentioned in Footnote:

A base class subobject of an empty class type may have zero size.
When an empty class is used as a base class, its size can be 0

That is, if it is this structure

 struct Baz2 : Bar {
    int* p;
  };

The compiler will think that the size of Bar is 0, so sizeof(Baz2) is 4.

 struct Baz2
  +-----------------------------------+
  | +-------------------------------+ |
  | | int* p                        | |
  | +-------------------------------+ |
  +-----------------------------------+

Compilers are not required to implement it this way, but you can assume that most standard compilers implement it this way.

eliminate bloat

Now that you know how to eliminate this overhead, the question is what to do next? The most intuitive, list<> directly inherits Allocator, as follows:

 template <class T, class Alloc = allocator<T> >
    class list : private Alloc {
      . . .
      struct Node { . . . };
      Node* head_;      

     public:
      explicit list(Alloc const& a = Alloc())
        : Alloc(a) { . . . }
      . . .
    };

This is certainly possible. The member functions in the list can be directly called this->allocate() instead of allco_.allocate() to complete the memory request.
However, user-supplied Alloc is allowed to have virtual functions, which may conflict with some methods in the subclass list<>. ( list<>::init and Alloc::init() ).

Another feasible way is to pack Alloc into a member variable of list<> (such as a pointer to the first list node), so that the interface of Allocator will not be exposed.

 template <class T, class Alloc = allocator<T> >
    class list {
      . . .
      struct Node { . . . };
      struct P : public Alloc {
        P(Alloc const& a) : Alloc(a), p(0) { }
        Node* p;
      };
      P head_;
      
     public:
      explicit list(Alloc const& a = Alloc())
        : head_(a) { . . . }
      . . .
    };

If this method is used, the application memory is used head.allocate() . There is no extra overhead, and list<> works the same as before. But like any good optimization, the implementation is always a bit ugly, but it doesn't affect the interface anyway.

One point solution

Of course there is room for improvement. Take a look at this template below

 template <class Base, class Member>
    struct BaseOpt : Base {
      Member m;
      BaseOpt(Base const& b, Member const& mem) 
        : Base(b), m(mem) { }
    };

Using this template, then the interface of the list can become like this:

 template <class T, class Alloc = allocator<T> >
    class list {
      . . .
      struct Node { . . . };
      BaseOpt<Alloc,Node*> head_;
      
     public:
      explicit list(Alloc const& a = Alloc())
        : head_(a,0) { . . . }
      . . .
    };

This implementation doesn't look that bad compared to the original version. Other STL containers can also use BaseOpt to simplify the code. However, it will be strange when member functions apply for memory, which we will not consider for the time being.

This optimization technique can now be well documented where BaseOpt is defined.

It is also possible to add some members to BaseOpt, but this is not recommended. This can cause name collisions with Base (as we did in our first attempt to eliminate bloat).

ending

This technique can be used in today's general-purpose compilers. Even if the compiler does not implement the empty base class optimization, there is no additional overhead.

【Translation】Optimization of "empty member" in C++ class

Optimization of "empty members" in C++ classes

Empty members "inflate"

empty object

eliminate bloat

One point solution

ending

ivkus

引用和评论

【翻译】Traits:一种新的而且有用的Template技巧

Visual Studio Code (VS Code) – C/C++ 入门

如何系统地入门学习stm32？

AI处理器组合

想从事嵌入式软件，有推荐的吗？

程序员如何利用周末提升自己

嵌入式行业真的没前途吗？