Reduction, grouping and partitioning, in-depth explanation of JavaStream termination operations

Mind map town building.

In the previous article, I told you about the first half of Stream's knowledge-including the overall overview of Stream and the creation of Stream and the conversion stream operation of Stream, as well as a concise description of some internal optimization points of Stream.

Although it's late, I will continue to give you the second part of Stream's knowledge-termination operation. Since the API content of this part is various and complicated, I will open an article for you to elaborate on it.

Before officially starting, let's talk about the characteristics of the aggregation method itself (then I will use the aggregation method to refer to the method in the final operation):

The aggregation method represents the final result of the entire stream calculation, so its return value is not a Stream.
The return value of the aggregation method may be empty. For example, if the filter does not match, Optional is used in JDK8 to avoid NPE.
Aggregation methods will call the evaluate method, which is an internal method, which can be used to determine whether a method is an aggregation method or not in the process of looking at the source code.

Ok, I know the characteristics of the aggregation method. In order to facilitate understanding, I divided the aggregation methods into several categories:

Among them, I will briefly explain the simple aggregation method, and the others will focus on the explanation, especially the collector, which can do too much. . .

The aggregation method of Stream is a must-have operation for us in using Stream. Learn this article carefully, not to mention that you will be able to use Stream immediately, at least you can do it 😄

1. Simple aggregation method

The first quarter, let's start with something simple.

Stream's aggregation methods are more than the stateless and stateful methods mentioned in the previous article, but some of them can be learned at a glance. In the first section, let's talk about this part of the method:

count() : Returns the size of the elements in the Stream.
forEach() : Consume each element through all the elements in the internal loop Stream. This method has no return value.
forEachOrder() : The effect is the same as the above method, but this can maintain the order of consumption, even in a multi-threaded environment.
anyMatch(Predicate predicate) : This is a short-circuit operation. It is judged whether any element can match the assertion by passing in the assertion parameter.
allMatch(Predicate predicate) : This is a short-circuit operation, which returns whether all elements can match the assertion by passing in the assertion parameter.
noneMatch(Predicate predicate) : This is a short-circuit operation. It is judged whether all elements cannot match the assertion by passing in the assertion parameter. If it is, it returns true, otherwise it is false.
findFirst() : This is a short-circuit operation that returns the first element in the Stream. The Stream may be empty, so the return value is handled with Optional.
findAny() : This is a short-circuit operation that returns any element in the Stream. Generally, the first element in a string stream. The stream may be empty, so the return value is handled with Optional.

Although the above are relatively simple, there are five methods involved in short-circuit operation. I still want to mention two things:

The first is the findFirst() and findAny() . Since they only need one element to end the method, the short-circuit effect is well understood.

Followed by the anyMatch method, it only needs to match one element method to end, so its short-circuit effect is also well understood.

The last is the allMatch method and noneMatch . At first glance, these two methods need to traverse all the elements in the entire stream. In fact, they are not. For example, allMatch can return false as long as one element does not match the assertion, and noneMatch has only one element. It can also return false if it matches the assertion, so they are all methods with a short-circuit effect.

2. Reduction

2.1 reduce: repeated evaluation

In the second section, let’s talk about reduction. Because the word is too abstract, I have to find an easy-to-understand explanation to translate this sentence. The following is the definition of reduction:

Combining all the elements in a Stream repeatedly to get a result is called reduction.

Note: In functional programming, this is called fold.

To give a very simple example, I have three elements 1, 2, and 3. I add the two to get the number 6. This process is reduction.

For another example, I have three elements 1, 2, and 3. I compare them, and finally pick the largest number 3 or pick the smallest number 1. This process is also reduction.

Below I give an example of summation to demonstrate reduction, which uses the reduce method:

        Optional<Integer> reduce = List.of(1, 2, 3).stream()
                .reduce((i1, i2) -> i1 + i2);

First of all, you may have noticed that I have been using the term two in the above small example, which means that the reduction is to process the two elements and get a final value, so the parameter of the reduce method is a binary expression Formula, it will process two parameters arbitrarily, and finally get a result, in which its parameter and result must be the same type.

For example, in the code, i1 and i2 are the two parameters of the binary expression, which represent the first element and the second element in the element respectively. When the first addition is completed, the result obtained will be assigned to i1 On the body, i2 will continue to represent the next element until the element is exhausted, and the final result is obtained.

If you think this is not elegant enough, you can also use the default method in Integer:

        Optional<Integer> reduce = List.of(1, 2, 3).stream()
                .reduce(Integer::sum);

This is also an example of using the method to refer to a lambda expression.

You may have also noticed that their return value is Optional, which is to prevent the case where the Stream has no elements.

You can also find a way to get rid of this situation, that is to make the element have at least one value, here reduce provides an overload method for us:

        Integer reduce = List.of(1, 2, 3).stream()
                .reduce(0, (i1, i2) -> i1 + i2);

As in the above example, one more parameter is added in front of the binary expression. This parameter is called the initial value, so that even if your Stream has no elements, it will eventually return a 0, so there is no need for Optional.

In the actual method operation, the initial value will occupy the position of i1 in the first execution, i2 represents the first element in the Stream, and then the resulting sum occupies the position of i1 again, and i2 represents the next element.

However, using the initial value is not without cost. It should conform to a principle: accumulator.apply(identity, i1) == i1 , which means that when it is executed for the first time, its return result should be the first element in your Stream.

For example, my example above is an addition operation, and the first addition is 0 + 1 = 1 , which conforms to the above principle. This principle is to ensure that the correct result can be obtained in the case of parallel streams.

If your initial value is 1, then the initialization of each thread is 1 in the case of concurrent, then your final sum will be larger than you expected.

`2.2 max: Use reduction to find the maximum`

The max method is also a reduction method, which directly calls the reduce method.

Let's take a look at an example:

        Optional<Integer> max = List.of(1, 2, 3).stream()
                .max((a, b) -> {
                    if (a > b) {
                        return 1;
                    } else {
                        return -1; 
                    }
                });

Yes, this is the max method usage, which makes me think I am not using a functional interface, of course you can also use the Integer method to simplify:

        Optional<Integer> max = List.of(1, 2, 3).stream()
                .max(Integer::compare);

Even so, this method still makes me feel very cumbersome. Although I can understand that passing parameters in the max method is to allow us to customize the sorting rules, I don't understand why there is no default sorting method according to natural sorting. Is it necessary to let me pass the parameters.

Until later I thought of the basic type Stream. Sure enough, they can get the maximum value directly without passing parameters:

        OptionalLong max = LongStream.of(1, 2, 3).max();

Sure enough, as far as I can think of, the class library designers have thought of it~

Note : OptionalLong is an encapsulation of Optional to the basic type long.

`2.3 min: Use reduction to find the smallest`

Let's just look at the example directly:

        Optional<Integer> max = List.of(1, 2, 3).stream()
                .min(Integer::compare);

The difference between it and max is that > replaced by < bottom layer, which is too simple and will not be repeated.

`3. Collector`

In the third section, we take a look at the collector. Its function is to collect the elements in the Stream to form a new set.

Although I already gave a mind map at the beginning of this article, because the collector has more APIs, I drew another one, which is a supplement to the opening one:

The method name of the collector is collect, and its method is defined as follows:

    <R, A> R collect(Collector<? super T, A, R> collector);

As the name suggests, the collector is used to collect the elements of the Stream. We can customize what is collected in the end, but we generally don't need to write it ourselves, because the JDK has a built-in implementation class of Collector-Collectors.

`3.1 Collection method`

Through Collectors we can use its built-in methods to easily collect data:

For example, if you want to collect elements into a collection, then you can use the toCollection or toList methods, but we generally don't use toCollection because it needs to pass parameters, and no one likes to pass parameters.

You can also use toUnmodifiableList. The difference between it and toList is that the returned collection cannot change elements, such as deleting or adding.

For another example, if you want to collect the elements after deduplication, then you can use toSet or toUnmodifiableSet.

Let's put a relatively simple example:

        // toList
        List.of(1, 2, 3).stream().collect(Collectors.toList());

        // toUnmodifiableList
        List.of(1, 2, 3).stream().collect(Collectors.toUnmodifiableList());

        // toSet
        List.of(1, 2, 3).stream().collect(Collectors.toSet());

        // toUnmodifiableSet
        List.of(1, 2, 3).stream().collect(Collectors.toUnmodifiableSet());

None of the above methods have parameters and are ready to use. The bottom layer of toList is also a classic ArrayList, and the bottom layer of toSet is a classic HashSet.

Maybe sometimes you may want to collect one into a Map, for example, by converting the order data into an order number corresponding to an order, then you can use toMap():

        List<Order> orders = List.of(new Order(), new Order());

        Map<String, Order> map = orders.stream()
                .collect(Collectors.toMap(Order::getOrderNo, order -> order));

toMap() has two parameters:

The first parameter represents the key, which means that you want to set the key of a Map, and what I specify here is the orderNo in the element.
The second parameter represents value, which means that you want to set the value of a Map. Here I directly treat the element itself as the value, so the result is a Map<String, Order>.

You can also use element attributes as values:

        List<Order> orders = List.of(new Order(), new Order());

        Map<String, List<Item>> map = orders.stream()
                .collect(Collectors.toMap(Order::getOrderNo, Order::getItemList));

What is returned is a Map of order number + product list.

toMap() also has two associated methods:

toUnmodifiableMap(): returns an unmodifiable Map.
toConcurrentMap(): returns a thread-safe Map.

The parameters of these two methods are exactly the same as toMap(). The only difference is that the characteristics of the map generated at the bottom are not the same. We generally use the simple toMap(). The bottom layer is our most commonly used HashMap(). accomplish.

Although the toMap() function is powerful and commonly used, it has a fatal flaw.

We know that HahsMap will overwrite the same key when it encounters the same key, but if the key you specify is repeated when the toMap() method generates the Map, it will directly throw an exception.

For example, in the order example above, we assume that the order numbers of the two orders are the same, but you specify the order number as the key, then this method will directly throw an IllegalStateException because it does not allow the keys in the elements to be the same.

`3.2 Grouping method`

If you want to classify the data, but the key you specify can be repeated, then you should use groupingBy instead of toMap.

For a simple example, I want to group an order collection by order type, then I can do this:

        List<Order> orders = List.of(new Order(), new Order());

        Map<Integer, List<Order>> collect = orders.stream()
                .collect(Collectors.groupingBy(Order::getOrderType));

Directly specify the element attribute used for grouping, it will automatically group according to this attribute, and collect the grouping result as a List.

        List<Order> orders = List.of(new Order(), new Order());

        Map<Integer, Set<Order>> collect = orders.stream()
                .collect(Collectors.groupingBy(Order::getOrderType, toSet()));

groupingBy also provides an overload that allows you to customize the collector type, so its second parameter is a Collector collector object.

For the Collector type, we generally still use the Collectors class. Here, since we have used Collectors earlier, there is no need to declare a toSet() method directly, which means that we collect the grouped elements as a Set.

There is also a similar method of groupingBy called groupingByConcurrent(), this method can improve the efficiency of grouping in parallel, but it does not guarantee the order, so I won't expand it here.

`3.3 Partitioning method`

Next, I will introduce another case of grouping-partition, the name is a bit convoluted, but the meaning is very simple:

data according to TRUE or FALSE is called partition.

For example, we group a collection of orders according to whether they are paid or not. This is the partition:

        List<Order> orders = List.of(new Order(), new Order());
        
        Map<Boolean, List<Order>> collect = orders.stream()
                .collect(Collectors.partitioningBy(Order::getIsPaid));

Because there are only two statuses whether an order is paid or not: paid and unpaid, this grouping method is called partition.

Like groupingBy, it also has an overloaded method to customize the collector type:

        List<Order> orders = List.of(new Order(), new Order());

        Map<Boolean, Set<Order>> collect = orders.stream()
                .collect(Collectors.partitioningBy(Order::getIsPaid, toSet()));

`3.4 Classic Re-engraving Method`

Finally came to the last section, please forgive me for giving such a name to this part of the method, but these methods are indeed as I said: classic re-engraving.

In other words, Collectors implements the original method of Stream again, including:

map → mapping
filter → filtering
flatMap → flatMapping
count → counting
reduce → reducing
max → maxBy
min → minBy

I will not enumerate the functions of these methods one by one. The previous article has already covered it in detail. The only difference is that some methods have an additional parameter. This parameter is the collection parameter that we talked about in the grouping and partitioning. You can specify what container to collect.

I took them out and mainly wanted to say why so many methods should be reproduced. Here I will talk about my personal opinions, which do not represent official opinions.

mainly for the combination of functions.

What does that mean? For example, I have another requirement: use order types to group orders and find out how many orders there are in each group.

We have already talked about order grouping. It’s enough to find out how many orders there are in each group as long as you get the size of the corresponding list, but we don’t have to be so troublesome, but one step in place. When outputting the results, the key-value pair is the order type and order. quantity:

        Map<Integer, Long> collect = orders.stream()
                .collect(Collectors.groupingBy(Order::getOrderType, counting()));

That's it, it's as simple as that, that's all, here is equivalent to saying that we have performed another counting operation on the grouped data.

The above example may not be obvious. When we need to operate on the last collected data, we generally need to re-convert it to Stream and then operate, but using these methods of Collectors can make you very convenient in Collectors. Perform data processing.

For another example, we still group orders by order type, but we want to get the largest amount of each type of order, then we can do this:

        List<Order> orders = List.of(new Order(), new Order());        
       
        Map<Integer, Optional<Order>> collect2 = orders.stream()
                .collect(groupingBy(Order::getOrderType, 
                        maxBy(Comparator.comparing(Order::getMoney))));

It's more concise and more convenient. We don't need to find the maximum value one by one after grouping. It can be done in one step.

After another grouping, find the order amount of each group:

        List<Order> orders = List.of(new Order(), new Order());        
       
        Map<Integer, Long> collect = orders.stream()
                .collect(groupingBy(Order::getOrderType, summingLong(Order::getMoney)));

However, we didn't talk about summingLong here. It is a built-in please and operation that supports Integer, Long and Double.

There is a similar method called averagingLong just by looking at the name. It is relatively simple to find the average. It is recommended that you can glance at it when you are okay.

It's over, the last method joining() is very useful for concatenating strings:

        List<Order> orders = List.of(new Order(), new Order());

        String collect = orders.stream()
                .map(Order::getOrderNo).collect(Collectors.joining("，"));

The method name of this method looks familiar, yes, the String class added a new join() method after JDK8, which is also used to splice strings. The joining of Collectors is the same as its function, and the underlying implementation is the same. StringJoiner class.

`4. Summary`

Finally finished.

In this finalization operation of Stream, I mentioned all the aggregation methods in Stream. It can be said that after reading this article, all aggregation operations of Stream will be mastered. If you just have something, it's OK, otherwise Stream can't do XX at all in your knowledge system, so it's a little bit ridiculous.

Of course, I still recommend that you use these concise APIs more in your projects to improve the readability of the code and make it more concise. It is easy to make others shine when you are reviewed~

Reference books:

Java8 actual combat

Reduction, grouping and partitioning, in-depth explanation of JavaStream termination operations

1. Simple aggregation method

2. Reduction

2.1 reduce: repeated evaluation

`2.2 max: Use reduction to find the maximum`

`2.3 min: Use reduction to find the smallest`

`3. Collector`

`3.1 Collection method`

`3.2 Grouping method`

`3.3 Partitioning method`

`3.4 Classic Re-engraving Method`

`4. Summary`

和耳朵

`引用和评论`

前后端都要懂的 Linux 中间件安装与常用命令指南

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性