Talking about how to implement a custom iterator

Implement your own iterator

Use std::iterator

Before C++17, the implementation of custom iterators was recommended to be derived from std::iterator.

The basic definition of std::iterator

Std::iterator has this definition:

template<
    class Category,
    class T,
    class Distance = std::ptrdiff_t,
    class Pointer = T*,
    class Reference = T&
> struct iterator;

Among them, T is your container type, no need to mention it. And Category is the so-called iterator tag that must be specified first, refer to here . Category can mainly be:

input_iterator_tag: input iterator
output_iterator_tag: output iterator
forward_iterator_tag: forward iterator
bidirectional_iterator_tag: bidirectional iterator
random_access_iterator_tag: random access iterator
contiguous_iterator_tag: continuous iterator

These labels seem quite inexplicable, as if I know their purpose, but in fact they are difficult to understand and difficult to choose.

`Iterator label`

The following is a rough introduction to the characteristics of them and their associated entities to help you understand.

These tags are actually bound and associated with some entity classes of the same name such as input_iterator, etc., through the template specialization technology to achieve proprietary distance () and advance (), in order to achieve a specific iterative optimization effect.

`input_iterator_tag`

input_iterator_tag can wrap the output of a function-to use it as its input stream. So it can only be incremented (only +1), you can't add +n to it, you can only simulate the corresponding effect by looping n increments. The input_iterator cannot be decremented (-1) because the input stream has no such characteristics. Its iterator value ( *it ) is read-only and you cannot set a value on it.

But the iterator value of output_iterator_tag and forward_iterator_tag are readable and writable. The readable and writable iterator value refers to:

std::list<int> l{1,2,3};
auto it = l.begin();
++it;
(*it) = 5; // <- set value back into the container pointed by iterator

input_iterator the container as an input stream, and you can receive the input data stream through input_iterator.

`output_iterator_tag`

output_iterator_tag rarely used directly by users. It is usually used in conjunction with back_insert_iterator/ front_insert_iterator/ insert_iterator and ostream_iterator.

output_iterator has no ++/-- ability. You can write/place new values into the container pointed to by output_iterator

If you have a presentation requirement for the output stream style, you can choose it.

`forward_iterator_tag`

forward_iterator_tag represents the forward iterator, so it can only be incremented, not regressed. It inherits input_iterator_tag , but has some enhancements, such as allowing setting values.

In terms of ability, input_iterator supports reading/setting values, and also supports incremental walking, but does not support decreasing walking (simulation required, low efficiency), +n needs to use loop simulation, which is inefficient, but if your container has only such exposed Demand, then forward_iterator_tag is the best choice.

In theory, forward_iterator_tag must at least implement begin/end.

`bidirectional_iterator_tag`

bidirectional_iterator_tag associated entity bidirectional_iterator is bidirectional walking, either it++ may it-- , e.g. std :: list. Like forward_iterator_tag , bidirectional_iterator_tag cannot directly +n (and -n), so +n needs a specialized advance function to loop n times, and +1 each time (ie, simulate by looping n increments or decrements).

In theory, bidirectional_iterator_tag must implement begin/end and rbegin/rend at the same time.

`random_access_iterator_tag`

random_access_iterator_tag random access iterators represented, random_access_iterator support for reading / setting value, increment decrement support, supports + n / -n.

Since random_access_iterator supports efficient +n/-n, this also means that it allows efficient direct positioning. The container of this iterator usually also supports operator [] subscript access by the way, just like std::vector.

`contiguous_iterator_tag`

contiguous_iterator_tag was introduced in C++17, but the support of compilers is problematic, so we can't introduce it in detail at present, and it is unnecessary to consider its existence for implementation.

`Implementation of a custom iterator`

A custom iterator needs to select an iterator label, that is, select the set of support capabilities of the iterator. Here is an example:

namespace customized_iterators {
  template<long FROM, long TO>
  class Range {
    public:
    // member typedefs provided through inheriting from std::iterator
    class iterator : public std::iterator<std::forward_iterator_tag, // iterator_category
    long,                      // value_type
    long,                      // difference_type
    const long *,              // pointer
    const long &               // reference
      > {
      long num = FROM;

      public:
      iterator(long _num = 0)
        : num(_num) {}
      iterator &operator++() {
        num = TO >= FROM ? num + 1 : num - 1;
        return *this;
      }
      iterator operator++(int) {
        iterator ret_val = *this;
        ++(*this);
        return ret_val;
      }
      bool operator==(iterator other) const { return num == other.num; }
      bool operator!=(iterator other) const { return !(*this == other); }
      long operator*() { return num; }
    };
    iterator begin() { return FROM; }
    iterator end() { return TO >= FROM ? TO + 1 : TO - 1; }
  };

  void test_range() {
    Range<5, 13> r;
    for (auto v : r) std::cout << v << ',';
    std::cout << '\n';
  }

}

The prototype of this example comes from std::iterator and its original author on cppreference, with slight modifications.

`Increment and decrement operator overload`

A separate section is dedicated, because there are too many rubbish tutorials.

The operator overload of increment and decrement is divided into two forms of prefix and suffix. The prefix method returns the reference , and the suffix method returns the new copy :

struct X {
  // 前缀自增
  X& operator++() {
    // 实际上的自增在此进行
    return *this; // 以引用返回新值
  }

  // 后缀自增
  X operator++(int) {
    X old = *this; // 复制旧值
    operator++();  // 前缀自增
    return old;    // 返回旧值
  }

  // 前缀自减
  X& operator--() {
    // 实际上的自减在此进行
    return *this; // 以引用返回新值
  }

  // 后缀自减
  X operator--(int) {
    X old = *this; // 复制旧值
    operator--();  // 前缀自减
    return old;    // 返回旧值
  }
};

Or check the document and document , don't go to those tutorials, I can't find two correct ones.

The correct encoding is to implement a prefix overload, and then implement suffix overload based on it:

struct incr {
  int val{};
  incr &operator++() {
    val++;
    return *this;
  }
  incr operator++(int d) {
    incr ret_val = *this;
    ++(*this);
    return ret_val;
  }
};

If necessary, you may need to implement the operator= or X(X const& o) copy constructor. But for simple trivial struct, it can be omitted (if you are not sure whether automatic memory copy is provided, consider viewing the assembly code, or simply implement the operator= or X(X const& o) copy constructor explicitly)

`Since C++17`

But since C++17 std::iterator has been deprecated.

If you really care about here see the relevant discussion.

In most cases, you can still use std::iterator to simplify code writing, but this feature and the early concepts of iterator tags, categories, etc. are outdated.

`Fully handwritten iterator`

Therefore, in the new era starting from C++17, in principle, custom iterators can only be handwritten for the time being.

namespace customized_iterators {
  namespace manually {
    template<long FROM, long TO>
    class Range {
      public:
      class iterator {
        long num = FROM;

        public:
        iterator(long _num = 0)
          : num(_num) {}
        iterator &operator++() {
          num = TO >= FROM ? num + 1 : num - 1;
          return *this;
        }
        iterator operator++(int) {
          iterator ret_val = *this;
          ++(*this);
          return ret_val;
        }
        bool operator==(iterator other) const { return num == other.num; }
        bool operator!=(iterator other) const { return !(*this == other); }
        long operator*() { return num; }
        // iterator traits
        using difference_type = long;
        using value_type = long;
        using pointer = const long *;
        using reference = const long &;
        using iterator_category = std::forward_iterator_tag;
      };
      iterator begin() { return FROM; }
      iterator end() { return TO >= FROM ? TO + 1 : TO - 1; }
    };
  } // namespace manually

  void test_range() {
    manually::Range<5, 13> r;
    for (auto v : r) std::cout << v << ',';
    std::cout << '\n';
  }

}

The iterator traits part of the example is not required, you don't need to support them at all.

`Things to take care of`

Considerations for fully handwritten iterators include:

begin() and end()
The iterator embedding class (not necessarily limited to embedding), at least realize:
1. Increment operator overloaded in order to walk
2. Decrement operator overload, if it is bidirectional walking (bidirectional_iterator_tag) or random walking (random_access_iterator_tag)
3. operator* algorithm overloaded to facilitate the evaluation of the iterator
4. operator!= operator overload to calculate the iteration range; if necessary, you can also explicitly overload operator== (by default, the compiler automatically generates a matching substitute !=

If your code supports iteration ranges, you can use for range loops:

your_collection coll;
for(auto &v: coll) {
  std::cout << v << '\n';
}

Regarding the expansion of the for range loop, you can check here .

`After C++20`

After C++20, iterators have undergone tremendous changes. But because its engineering implementation is still very early, so I won't discuss it in this article.

`Other related`

`In addition to iterator and const_iterator`

For code specification and safety, getters usually provide two at a time, writable and non-writable:

struct incr {
  int &val(){ return _val; }
  int const &val() const { return _val; }
  private:
  int _val{};
}

In the same way, the begin() and end() of the iterator must provide at least two versions of const and non-const. Generally speaking, you can help provide multiple versions through independent implementations:

struct XXX {
  
  // ... struct leveled_iter_data {
  //    static leveled_iter_data begin(NodePtr root_) {...}
  //.   static leveled_iter_data end(NodePtr root_) {...}
  // }
  
  using iterator = leveled_iter_data;
  using const_iterator = const iterator;
  iterator begin() { return iterator::begin(this); }
  const_iterator begin() const { return const_iterator::begin(this); }
  iterator end() { return iterator::end(this); }
  const_iterator end() const { return const_iterator::end(this); }

}

This is a no-brainer way. The security of reading and writing is constrained within XXX: of course the owner can understand what should be exposed and what needs to be temporarily restricted.

In addition to iterator and const_iterator, rbegin/rend, cbegin/cend, etc. can also be considered.

`Note: the use of iterators`

When using iterators, we must pay attention to the rule of

void test_iter_invalidate() {
  std::vector<int> vi{3, 7};
  auto it = vi.begin();
  it = vi.insert(it, 11);
  vi.insert(it, 5000, 23);
  vi.insert(it, 1, 31);                // crach here!
  std::cout << (*it) << '\n';
  return;
}

In most OS environments, the vi.insert(it, 5000, 23); statement has a high probability of causing the vector to have to reallocate the internal array space. Therefore, after the statement is executed, the internal pointer held by it is already meaningless (it still points to the old buffer A certain position), so continuing to use it in the next line of statements will result in incorrect pointing and writing. Because outdated buffers are likely to have been scheduled to be in a page fault state, this error will often cause SIGSEGV fatal exceptions. If the SIGSEGV signal is generated, you may be lucky. On the contrary, if the outdated buffer is still valid, then this statement can be executed without reporting any errors, that is terrible.

`Iterator search and delete`

The stdlib container uses an erase and remove to actually delete an element. Taking std::list as an example, remove_if() can find elements that meet the criteria from the list, gather them (collect) and move them to the end of the list, and then return the position iter of the first element in the group of elements. However, these elements are not removed from the list. If you need to remove them, you need to explicitly remove them with list.erase(iter, list.end()).

So delete the element like this:

bool IsOdd(int i) { return i & 1; }

std::vector<int> v = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
v.erase(std::remove_if(v.begin(), v.end(), IsOdd), v.end());

std::list<int> l = { 1,100,2,3,10,1,11,-1,12 };
l.erase(l.remove_if(IsOdd), l.end());

Since std::vector cannot gather elements to the end of the linked list like std::list, it does not have the remove_if() member function, so doing search & erase on it requires the participation of std::remove_if And std::list can be done directly using the member function remove_if, and the code is slightly more concise.

Since C++20, erase and remove_if can be simplified to std::erase_if() or erase_if() member functions, such as std::erase, std::erase_if (std::vector) .

`postscript`

This time About customizing your own STL-like iterator contributed some personal understanding and best practice guidelines, but there is still a little bit of meaning.

Next time, consider whether to introduce a tree_t and its iterator implementation, which may be of more reference value.

`Refs`

:end: