What is Java Stream and why does it exist?
Why don't use only Collections API?
A Stream is a sequence of values. The java.util.stream package defines types for streams of reference values (Stream) and some primitive values (IntStream, LongStream, and DoubleStream).
Streams are like iterators in the way they supply their elements as needed for processing.
Streams were introduced in Java 8 as a way to add manageability to existing data structures. With them we can use some very interesting methods for our structures:
Map
Reduce
Filter
Find
Foreach
Among others.
For this, it is important to create a stream pipeline. A stream pipeline is used to manage actions on a stream. It is composed of iterators and finalizers, where iterators are methods that return stream and finalizers do not return streams and still close it. Example:
To use Java streams we need to convert the data structure we are working on to a Stream, assemble and run our pipeline. Finally, we can convert the stream pipeline output back to the source structure if necessary.
You might be wondering: "Why weren't such methods declared directly in the data structures?"
As you can see, we need to convert the structure we are working on to a stream before using the utility methods because they are not present in the Collections API but have been externalized in their own Streams API. And these are some of the reasons why Java Architects proposed this solution:
The Collections API is more concerned with storing the data than with the actions and work that we are going to do on the data itself;
Manipulation X Management of the Data Structure
The idea of streams is not to change the original data structure, therefore, the map, filter, reduce, and other stream methods DO NOT change the data structure. The methods that exist today in Collections, such as
removelAll()
, change the original data structure and that in itself is an important difference.The Java architects claimed that they even put everything in the same Collections package, but that this behavior was not clear enough and that it ended up confusing those who already used the Collections API.
Eager Loading X Lazy Loading
The processing of Collections API's methods is eager loading. In this context, this means that these methods are executed as soon as they are called. In the case of Streams, the methods are lazy loading, which in this context means that they are only called when the pipeline finalizer method is called. This seems like a silly change, but it opens up room for optimization in the processing of long sequences, as we have a view of the pipeline, all the logic knows which part of the processing it can cut. Example:
In the example above we could see that:
First, the Stream is created
The stream verify the finalizer methods is a
first()
. This is means it only necessary to processmap()
andfilter()
only once.The stream pipeline process the
map()
andfilter()
only onceThe result is input to the
first()
method which converts and returns it as an Integer
Even though those methods we declare in Collections API we would still need to do a conversion to work with asynchronous processing, see:
Note that this doesn't come naturally to users either;
The risk of collision in method names increases every time we insert a new method in an API with hierarchically ordered classes.