Delayed execution and immutability, the system explains JavaStream data processing

When I was writing business in the company recently, I suddenly couldn't remember how to write the accumulation Stream

I had no choice but to program for Google. After spending my precious three minutes, I learned it. It's very simple.

Since I used JDK8, Stream has been my most commonly used feature, and various streaming operations have taken off. However, after this incident, I suddenly felt that Stream was really new to me.

Maybe everyone is the same, the most commonly used thing, and it is the easiest to ignore it. Even if you are preparing for an interview, you will definitely not remember to look at something like Stream.

But now that I have noticed it, I have to sort it out again, which can be regarded as an inspection of my overall knowledge system.

It took a lot of effort to write this Stream. I hope you will re-acquaint and learn Stream with me, whether you understand the API or the internal features, afraid of endless truths, and I will have further joy.

In this article, I divide the content of Stream into the following parts:

At first glance at this map, you may be a bit confused about the two terms of conversion stream operation and final stream operation. In fact, this is that I divided all APIs in Stream into two categories, and each category has a corresponding name (refer to Java8 related Books, see the end of the article):

conversion stream operation : For example, the filter and map methods convert one Stream to another, and the return value is all Stream.
End stream operation : For example, the count and collect methods summarize a Stream into the result we need, and the return value is not a Stream.

I also divide the APIs for conversion stream operations into two categories. There will be detailed examples in the article. Let’s take a look at the definitions first and get a general impression:

Stateless : That is, the execution of this method does not need to rely on the result set of the previous method execution.
has state : That is, the execution of this method needs to rely on the result set of the previous method.

Due to the excessive content of Stream, I split Stream into two articles. This article is the first one. The content is informative and the use cases are simple and rich.

Although the topic of the second article only has one finalization operation, the finalization operation API is more complicated, so the content is also informative and the use cases are simple and rich. In terms of length, the two are similar, so stay tuned.

Note : Since my computer this machine is JDK11, but forgot to write when switching to JDK8, so large numbers in use cases List.of() in JDK8 is not, which is equivalent to JDK8 in Arrays.asList() .

`1. Why use Stream?`

Everything also stems from the release of JDK8. In that era when functional programming languages were in full swing, Java was criticized for its bloatware (strong object-oriented). The community urgently needed Java to add functional language features to improve this situation. Finally In 2014, Java released JDK8.

In JDK8, I think the biggest new feature is the addition of functional interfaces and lambda expressions, these two features are taken from functional programming.

The addition of these two features makes Java simpler and more elegant. Using functional style against functional style to consolidate the status of Java's big brother is simply a mastery of skills.

And Stream is a class library made by JDK8 for the collection class library based on the above two features. It allows us to process the data in the collection in a more concise and concise way through lambda expressions, which can be very easy Complete operations such as filtering, grouping, collecting, and reducing, so I would like to call Stream the best practice for functional interfaces.

`1.1 Clearer code structure`

Stream has a clearer code structure. In order to better explain how Stream makes the code clearer, suppose we have a very simple requirement: find all elements greater than 2 in a set .

Let's take a look before using Stream:

        List<Integer> list = List.of(1, 2, 3);
        
        List<Integer> filterList = new ArrayList<>();
        
        for (Integer i : list) {
            if (i > 2) {
                filterList.add(i);
            }
        }
        
        System.out.println(filterList);

The above code is easy to understand, so I won't explain it too much. In fact, it's okay because our needs are relatively simple. What if we need more?

For each additional requirement, then another condition must be added to the if, and there are often many fields on the object in our development, so the condition may be four or five, and it may become like this in the end:

        List<Integer> list = List.of(1, 2, 3);

        List<Integer> filterList = new ArrayList<>();

        for (Integer i : list) {
            if (i > 2 && i < 10 && (i % 2 == 0)) {
                filterList.add(i);
            }
        }

        System.out.println(filterList);

If there are a lot of conditions in it, it looks messy, in fact, it’s okay, the most terrible thing is that there are often many similar requirements in the project. The difference between them is only a certain condition, then you need Copy a lot of code, change it, and then go live, which leads to a lot of repetitive code in the code.

If you Stream, everything will become clear and easy to understand:

        List<Integer> list = List.of(1, 2, 3).stream()
                .filter(i -> i > 2)
                .filter(i -> i < 10)
                .filter(i -> i % 2 == 0)
                .collect(toList());

For this code, you only need to pay attention to what we are most concerned about: the filter condition is enough, the method name filter can let you know that it is a filter condition, and the method name collect can also tell that it is a collector, which will eventually The results are collected into a List.

At the same time, you may have found out, why is there no need to write loops in the above code?

Because Stream will help us with implicit loops, this is called: internal iteration, which corresponds to our common external iteration.

So even if you don't write a loop, it will loop again.

`1.2 Don't care about variable state`

immutable at the beginning of its design, and its immutability has two meanings:

Since each Stream operation generates a new Stream, Stream is immutable, just like String.
Only the reference to the original collection is saved in the Stream, so when performing some operations that will modify the element, a new element is generated from the original element, so any operation of the Stream will not affect the original object.

The first meaning can help us make chained calls. In fact, we often use chained calls in the process of using Stream. The second meaning is a major feature in functional programming: not modifying the state.

No matter what operations are performed on the Stream, it will not affect the original collection in the end, and its return value is also calculated on the basis of the original collection.

So in Stream, we don't need to care about the various side effects of operating the original collection of objects, it's used up.

For functional programming, please 160f78dc274628 Ruan Yifeng's Functional Programming Preliminary .

`1.3 Delayed execution and optimization`

Stream will only be executed when it finalization operation, such as:

        List.of(1, 2, 3).stream()
                .filter(i -> i > 2)
                .peek(System.out::println);

Such a piece of code will not be executed, the peek method can be regarded as forEach, here I use it to print the elements in the Stream.

Because the filter method and the peek method are both conversion stream methods, they will not trigger execution.

If we add a count method in the back, it can be executed normally:

        List.of(1, 2, 3).stream()
                .filter(i -> i > 2)
                .peek(System.out::println)
                .count();

The count method is a final operation, used to calculate how many elements are in the Stream, and its return value is a long type.

Stream's feature that it will not be executed without termination is called delayed execution.

At the same time, Stream will also loop merge. For specific examples, see Section 3.

`2. Create Stream`

For the completeness of the article, I thought about adding a section on creating a Stream. This section mainly introduces some common ways to create a Stream. The creation of a Stream can generally be divided into two situations:

uses the Steam interface to create
through collection library

At the same time, I will talk about the parallel streams and connections of Streams. Both create Streams but have different characteristics.

`2.1 Create through the Stream interface`

Stream as an interface, it defines several static methods in the interface to provide us with an API to create a Stream:

    public static<T> Stream<T> of(T... values) {
        return Arrays.stream(values);
    }

The first is the of method, which provides a generic variable parameter and creates a Stream with generics for us. At the same time, if your parameter is a basic type, automatic packaging will be used to wrap the basic type:

        Stream<Integer> integerStream = Stream.of(1, 2, 3);

        Stream<Double> doubleStream = Stream.of(1.1d, 2.2d, 3.3d);

        Stream<String> stringStream = Stream.of("1", "2", "3");

Of course, you can also directly create an empty Stream, just call another static method-empty(), its generic is an Object:

        Stream<Object> empty = Stream.empty();

The above are all the creation methods that we make us easy to understand, and there is another way to create a Stream with an unlimited number of elements-generate():

    public static<T> Stream<T> generate(Supplier<? extends T> s) {
        Objects.requireNonNull(s);
        return StreamSupport.stream(
                new StreamSpliterators.InfiniteSupplyingSpliterator.OfRef<>(Long.MAX_VALUE, s), false);
    }

From the point of view of method parameters, it accepts a functional interface—Supplier as a parameter. This functional interface is an interface used to create objects. You can compare it to an object creation factory. Stream will place objects created in this factory. Into the Stream:

        Stream<String> generate = Stream.generate(() -> "Supplier");

        Stream<Integer> generateInteger = Stream.generate(() -> 123);

I am here for the convenience of directly using Lamdba to construct a Supplier object. You can also directly pass in a Supplier object, which will construct the object through the get() method of the Supplier interface.

`2.2 Create by collection library`

Compared with the above one, the second method is more commonly used. We often perform Stream operations on the collection instead of manually constructing a Stream:

        Stream<Integer> integerStreamList = List.of(1, 2, 3).stream();
        
        Stream<String> stringStreamList = List.of("1", "2", "3").stream();

In Java 8, the top-level interface Collection collection has been added to a new interface default method- stream() , through this method we can easily create Stream operations for all collection subclasses:

        Stream<Integer> listStream = List.of(1, 2, 3).stream();
        
        Stream<Integer> setStream = Set.of(1, 2, 3).stream();

By consulting the source code, you can send that the stream() method essentially creates a Stream by calling a Stream tool class:

    default Stream<E> stream() {
        return StreamSupport.stream(spliterator(), false);
    }

`2.3 Create a parallel stream`

In the above example, all Streams are serial streams. In some scenarios, in order to maximize the performance of squeezing multi-core CPUs, we can use parallel streams, which perform parallel operations through the fork/join framework introduced in JDK7. We can create parallel streams as follows:

        Stream<Integer> integerParallelStream = Stream.of(1, 2, 3).parallel();

        Stream<String> stringParallelStream = Stream.of("1", "2", "3").parallel();

        Stream<Integer> integerParallelStreamList = List.of(1, 2, 3).parallelStream();

        Stream<String> stringParallelStreamList = List.of("1", "2", "3").parallelStream();

Yes, there is no way to create a parallel stream directly in the static method of Stream. We need to call the parallel() method again after constructing the Stream to create a parallel stream, because calling the parallel() method will not recreate a parallel stream object , But set a parallel parameter on the original Stream object.

Of course, we can also see, Collection interface can create a parallel stream directly, just call and stream() corresponding parallelStream() method, as I just mentioned, in fact, only different parameters between them:

    default Stream<E> stream() {
        return StreamSupport.stream(spliterator(), false);
    }

    default Stream<E> parallelStream() {
        return StreamSupport.stream(spliterator(), true);
    }

However, under normal circumstances, we do not need to use parallel streams. When the elements in the stream are not more than a thousand, the performance will not be greatly improved, because it is costly to disperse the elements to different CPUs for calculation.

The advantage of parallelism is to make full use of the performance of multi-core CPUs, but data is often divided in use, and then distributed to each CPU for processing. If the data we use is an array structure, it can be easily divided, but if it is a linked list structure The data or Hash structure data is obviously not as convenient as the array structure.

So only when the elements in the Stream exceed 10,000 or even larger, the use of parallel streams can bring you a more obvious performance improvement.

Finally, when you have a parallel stream, you can also easily convert it into a serial stream sequential()

        Stream.of(1, 2, 3).parallel().sequential();

`2.4 Connect Stream`

If you construct two Streams in two places and want to combine them together when using them, you can use concat():

        Stream<Integer> concat = Stream
                .concat(Stream.of(1, 2, 3), Stream.of(4, 5, 6));

If two different generic streams are combined, automatic inference will automatically infer two types of the same parent class:

        Stream<Integer> integerStream = Stream.of(1, 2, 3);

        Stream<String> stringStream = Stream.of("1", "2", "3");

        Stream<? extends Serializable> stream = Stream.concat(integerStream, stringStream);

`3. Stateless method of Stream conversion operation`

stateless method: that is, the execution of this method does not need to rely on the result set of the previous method execution.

There are about three commonly used stateless APIs in Stream:

map() method: The parameter of this method is a Function object, which allows you to perform custom operations on the elements in the collection and retain the elements after the operation.
filter() method: The parameter of this method is a Predicate object, the execution result of Predicate is a Boolean type, so this method only keeps the elements whose return value is true, as the name suggests, we can use this method to do some filtering operations.
flatMap() method: This method is the same as the map() method. The parameter is a Function object, but the return value of this Function is required to be a Stream. This method can aggregate elements from multiple Streams and return.

Let's take a look at an example of the map() method:

        Stream<Integer> integerStreamList = List.of(1, 2, 3).stream();

        Stream<Integer> mapStream = integerStreamList.map(i -> i * 10);

We have a List. If we want to by 10 for each element in it, we can use the above writing method, where i is the variable name of the element in the List, → is to perform this element The operation is passed in a piece of code logic to execute in a very concise and clear way, and this piece of code will finally return a new Stream containing the result of the operation.

In order to better help everyone understand, I drew a simple diagram:

The following is an example of the filter() method:

        Stream<Integer> integerStreamList = List.of(10, 20, 30).stream();

        Stream<Integer> filterStream = integerStreamList.filter(i -> i >= 20);

In this code, the i >= 20 will be executed, and then the result with the return value of true will be saved in a new Stream and returned.

Here I also have a simple icon:

flatMap() method has been described above, but it is a bit too abstract. I also searched many examples in learning this method to get a better understanding.

According to the official document, this method is for one-to-many element flattening operations:

        List<Order> orders = List.of(new Order(), new Order());

        Stream<Item> itemStream = orders.stream()
                .flatMap(order -> order.getItemList().stream());

Here I use an order example to illustrate this method. Each of our orders contains a list of products. If I want to combine all the lists of products in the two orders into a new list of products, I need to use flatMap() method.

In the above code example, you can see that each order returns a stream of a list of goods. In this example, we only have two orders, so it will eventually return a Stream of two lists of goods. The function of the flatMap() method It is to extract the elements in these two Streams and put them in a new Stream.

The old rules, put a simple icon to illustrate:

In the illustration, I use cyan to represent Stream. In the final output, you can see that flatMap() turns the two streams into one stream for output. This is very useful in some scenarios, such as my order example above.

There is also a very peek() used stateless method 060f78dc275099:

    Stream<T> peek(Consumer<? super T> action);

The peek method accepts a Consumer object as a parameter, which is a parameter with no return value. We can use the peek method to do some operations such as printing elements:

        Stream<Integer> peekStream = integerStreamList.peek(i -> System.out.println(i));

However, if you are not familiar with it, it is not recommended. In some cases, it will not take effect, such as:

        List.of(1, 2, 3).stream()
                .map(i -> i * 10)
                .peek(System.out::println)
                .count();

The API document also indicates that this method is used for Debug. Through my experience, peek will only be executed when the Stream finally needs to reproduce the elements.

In the above example, count only needs to return the number of elements, so peek is not executed, and it will be executed if it is replaced by the collect method.

Or if there are filtering methods such as filter method and match related methods in Stream, it will also be executed.

`3.1 Basic type Stream`

The previous section mentioned the three most commonly used stateless methods in the three Streams. There are also several methods corresponding to map() and flatMap() in the stateless methods of Stream. They are:

mapToInt
mapToLong
mapToDouble
flatMapToInt
flatMapToLong
flatMapToDouble

These six methods can be seen from the method name first. They just convert the return value on the basis of map() or flatMap(). It stands to reason that there is no need to single out and make a method. In fact, they are The key is the return value:

The return value of is 160f78dc27580f IntStream
The return value of is 160f78dc275884 LongStream
The return value of is 160f78dc2758ef DoubleStream
The return value of is 160f78dc275960 IntStream
The return value of is 160f78dc2759b5 LongStream
The return value of is 160f78dc275a04 DoubleStream

In JDK5, in order to make Java more object-oriented, the concept of packaging classes is introduced. The eight basic data types correspond to a packaging class. This allows you to automatically unbox/pack when using basic types. It is to automatically use the conversion method of the wrapper class.

For example, in the previous example, I used this example:

        Stream<Integer> integerStream = Stream.of(1, 2, 3);

I used the basic data type parameters in the creation of the Stream, and its generics are automatically packaged as Integer, but we may sometimes ignore the cost of automatic unpacking. If we want to ignore this cost when using Stream, we can use Stream transfers to a Stream designed for basic data types:

IntStream: corresponding to int, short, char, boolean in the basic data type of
LongStream: corresponding to long in the basic data type
DoubleStream: Corresponding to double and float in the basic data types

In these interfaces, the Stream can be constructed by the of method as in the above example, and the box will not be automatically unpacked.

So the six methods mentioned above are actually to convert the ordinary stream into this basic type stream, which can have higher efficiency when we need it.

The basic type stream has the same API as Stream in terms of API, so as long as you understand Stream in terms of use, the basic type stream is also the same.

Note : IntStream, LongStream and DoubleStream are all interfaces, but they are not inherited from the Stream interface.

`3.2 Loop merging of stateless methods`

After talking about these methods of statelessness, let's look at an example from the previous article:

        List<Integer> list = List.of(1, 2, 3).stream()
                .filter(i -> i > 2)
                .filter(i -> i < 10)
                .filter(i -> i % 2 == 0)
                .collect(toList());

In this example, I used the filter method three times. Do you think that Stream will loop three times for filtering?

If one of the filters is changed to map, how many times do you think it will loop?

        List<Integer> list = List.of(1, 2, 3).stream()
                .map(i -> i * 10)
                .filter(i -> i < 10)
                .filter(i -> i % 2 == 0)
                .collect(toList());

From our intuition, we need to use the map method to process all elements first, and then use the filter method to filter, so we need to perform three loops.

But reviewing the definition of the stateless method, you can find that the other three conditions can be done in a loop, because the filter only depends on the calculation result of the map, not the result set after the map is executed, so just make sure to operate the map first. Operate the filter again, they can be completed in one cycle, this optimization method is called cycle merge.

can be executed in the same loop, and they can also be easily executed on multiple CPUs using parallel streams.

`4. Stateful method of Stream conversion operation`

After talking about the stateless method, the stateful method is relatively simple. You can know its function by just looking at the name:

Method name	Method result
distinct()	Deduplication of elements.
sorted()	The two methods of element sorting and overloading can pass in a sorting object when needed.
limit(long maxSize)	Passing in a number means that only the first X elements will be taken.
skip(long n)	Passing in a number means that X elements are skipped and the following elements are taken.
takeWhile(Predicate predicate)	New in JDK9, passing in an assertion parameter stops when the first assertion is false, and returns the element whose previous assertion is true.
dropWhile(Predicate predicate)	New in JDK9, passing in an assertion parameter stops when the first assertion is false, and deletes the elements that were previously asserted as true.

The above are all stateful methods. Their method execution must rely on the result set of the previous method to execute. For example, the sorting method needs to rely on the result set of the previous method to be sorted.

At the same time, the limit method and takeWhile are two short-circuit operation methods, which means more efficient, because the element we want may have been selected before the inner loop is completed.

Therefore, stateful methods cannot be executed in a loop like stateless methods. Each stateful method has to go through a separate internal loop. Therefore, the order in which the code is written will affect the execution results and performance of the program. I hope you Readers pay attention during the development process.

`5. Summary`

This article mainly provides an overview of Stream and describes two major characteristics of Stream:

is immutable: does not affect the original collection, and each call returns a new Stream.
Delayed execution: Stream will not execute until a termination operation is encountered.

At the same time, Stream's API is divided into conversion operations and finalization operations, and all common conversion operations are explained. The main content of the next chapter will be finalization operations.

In the process of looking at the Stream source code, I found an interesting thing. In the ReferencePipeline class (Stream's implementation class), the order of its methods is from top to bottom: stateless method → stateful method → aggregate method.

Well, after studying this article, I think everyone has a clear understanding of Stream as a whole, and at the same time, the API for conversion operations should have been mastered. After all, there are not many 😂, Java 8 still has many powerful features, we will next time Let’s talk~

At the same time, the following books were also referred to in the writing process of this article:

These three books are very good, the first one is written by the author of Java core technology, if you want to fully understand the upgrade of JDK8, you can read this one.

The second book can be said to be a booklet, which is only more than a hundred pages short, and mainly talks about some functional ideas.

If you can only read one, then I recommend the third one here. The Douban score is as high as 9.2, and the content and quality are both excellent.

Delayed execution and immutability, the system explains JavaStream data processing

`1. Why use Stream?`

`1.1 Clearer code structure`

`1.2 Don't care about variable state`

`1.3 Delayed execution and optimization`

`2. Create Stream`

`2.1 Create through the Stream interface`

`2.2 Create by collection library`

`2.3 Create a parallel stream`

`2.4 Connect Stream`

`3. Stateless method of Stream conversion operation`

`3.1 Basic type Stream`

`3.2 Loop merging of stateless methods`

`4. Stateful method of Stream conversion operation`

`5. Summary`

和耳朵

`引用和评论`

前后端都要懂的 Linux 中间件安装与常用命令指南

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性