java8-02-Stream-API

[TOC]

0 Stream简介

家庭住址：java.util.stream.Stream<T>
出生年月：Java8问世的时候他就来到了世上
主要技能：那可以吹上三天三夜了……
主要特征
- 不改变输入源
- 中间的各种操作是lazy的(惰性求值、延迟操作)
- 只有当开始消费流的时候，流才有意义
- 隐式迭代
……

总体感觉，Stream相当于一个进化版的Iterator。Java8源码里是这么注释的：

A sequence of elements supporting sequential and parallel aggregate operations

可以方便的对集合进行遍历、过滤、映射、汇聚、切片等复杂操作。最终汇聚成一个新的Stream，不改变原始数据。并且各种复杂的操作都是lazy的，也就是说会尽可能的将所有的中间操作在最终的汇聚操作一次性完成。

比起传统的对象和数据的操作，Stream更专注于对流的计算,和传说中的函数式编程有点类似。

他具体进化的多牛逼，自己体验吧。

给一组输入数据:

List<Integer> list = Arrays.asList(1, null, 3, 1, null, 4, 5, null, 2, 0);

求输入序列中非空奇数之和，并且相同奇数算作同一个。

在lambda还在娘胎里的时候，为了实现这个功能，可能会这么做

int s = 0;
// 先放在Set里去重
Set<Integer> set = new HashSet<>(list);
for (Integer i : set) {
  if (i != null && (i & 1) == 0) {
    s += i;
  }
}
System.out.println(s);

当lambda和Stream双剑合璧之后：

int sum = list.stream().filter(e -> e != null && (e & 1) == 1).distinct().mapToInt(i -> i).sum();

1 获取Stream

从lambda的其他好基友那里获取Stream

从1.8开始，接口中也可以存在 default 修饰的方法了。

java.util.Collection<E> 中有如下声明：

public interface Collection<E> extends Iterable<E> {
    // 获取普通的流
    default Stream<E> stream() {
        return StreamSupport.stream(spliterator(), false);
    }
    // 获取并行流
    default Stream<E> parallelStream() {
        return StreamSupport.stream(spliterator(), true);
    }
}

java.util.Arrays中有如下声明：

    public static <T> Stream<T> stream(T[] array) {
        return stream(array, 0, array.length);
    }

    public static IntStream stream(int[] array) {
        return stream(array, 0, array.length);
    }

    // 其他类似的方法不再一一列出

示例

List<String> strs = Arrays.asList("apache", "spark");
Stream<String> stringStream = strs.stream();

IntStream intStream = Arrays.stream(new int[] { 1, 25, 4, 2 });

通过Stream接口获取

Stream<String> stream = Stream.of("hello", "world");
Stream<String> stream2 = Stream.of("haha");
Stream<HouseInfo> stream3 = Stream.of(new HouseInfo[] { new HouseInfo(), new HouseInfo() });

Stream<Integer> stream4 = Stream.iterate(1, i -> 2 * i + 1);

Stream<Double> stream5 = Stream.generate(() -> Math.random());

注意：Stream.iterate()和 Stream.generate()生成的是无限流，一般要手动limit 。

2 转换Stream

流过滤、流切片

这部分相对来说还算简单明了，看个例子就够了

// 获取流
Stream<String> stream = Stream.of(//
    null, "apache", null, "apache", "apache", //
    "github", "docker", "java", //
    "hadoop", "linux", "spark", "alifafa");

stream// 去除null,保留包含a的字符串
    .filter(e -> e != null && e.contains("a"))//
    .distinct()// 去重,当然要有equals()和hashCode()方法支持了
    .limit(3)// 只取满足条件的前三个
    .forEach(System.out::println);// 消费流

map/flatMap

Stream的map定义如下：

<R> Stream<R> map(Function<? super T, ? extends R> mapper);

也就是说，接收一个输入(T:当前正在迭代的元素)，输出另一种类型(R)。

Stream.of(null, "apache", null, "apache", "apache", //
          "hadoop", "linux", "spark", "alifafa")//

  .filter(e -> e != null && e.length() > 0)//
  .map(str -> str.charAt(0))//取出第一个字符
  .forEach(System.out::println);

sorted

排序也比较直观，有两种：

// 按照元素的Comparable接口的实现来排序
Stream<T> sorted();

// 指定Comparator来自定义排序
Stream<T> sorted(Comparator<? super T> comparator);

示例:

List<HouseInfo> houseInfos = Lists.newArrayList(//
    new HouseInfo(1, "恒大星级公寓", 100, 1), //
    new HouseInfo(2, "汇智湖畔", 999, 2), //
    new HouseInfo(3, "张江汤臣豪园", 100, 1), //
    new HouseInfo(4, "保利星苑", 23, 10), //
    new HouseInfo(5, "北顾小区", 66, 23), //
    new HouseInfo(6, "北杰公寓", null, 55), //
    new HouseInfo(7, "保利星苑", 77, 66), //
    new HouseInfo(8, "保利星苑", 111, 12)//
);

houseInfos.stream().sorted((h1, h2) -> {
    if (h1 == null || h2 == null)
      return 0;
    if (h1.getDistance() == null || h2.getDistance() == null)
      return 0;
    int ret = h1.getDistance().compareTo(h2.getDistance());
    if (ret == 0) {
      if (h1.getBrowseCount() == null || h2.getBrowseCount() == null)
        return 0;
      return h1.getBrowseCount().compareTo(h2.getBrowseCount());
    }
    return ret;
});

3 终止/消费Stream

条件测试、初级统计操作

List<Integer> list = Arrays.asList(1, 2, 3, 4, 5);

// 是不是所有元素都大于零
System.out.println(list.stream().allMatch(e -> e > 0));
// 是不是存在偶数
System.out.println(list.stream().anyMatch(e -> (e & 1) == 0));
// 是不是都不小于零
System.out.println(list.stream().noneMatch(e -> e < 0));

// 找出第一个大于等于4的元素
Optional<Integer> optional = list.stream().filter(e -> e >= 4).findFirst();
// 如果存在的话,就执行ifPresent中指定的操作
optional.ifPresent(System.out::println);

// 大于等于4的元素的个数
System.out.println(list.stream().filter(e -> e >= 4).count());
// 获取最小的
System.out.println(list.stream().min(Integer::compareTo));
// 获取最大的
System.out.println(list.stream().max(Integer::compareTo));
// 先转换成IntStream,max就不需要比较器了
System.out.println(list.stream().mapToInt(i -> i).max());

reduce

这个词不知道怎么翻译，有人翻译为 规约 或 汇聚。

反正就是将经过一系列转换后的流中的数据最终收集起来，收集的同时可能会反复 apply 某个 reduce函数。

reduce()方法有以下两个重载的变体：

// 返回的不是Optional,因为正常情况下至少有参数identity可以保证返回值不会为null
T reduce(T identity, BinaryOperator<T> accumulator);

<U> U reduce(U identity,
             BiFunction<U, ? super T, U> accumulator,
             BinaryOperator<U> combiner);

示例：

// 遍历元素，反复apply (i,j)->i+j的操作
Integer reduce = Stream.iterate(1, i -> i + 1)//1,2,3,...,10,...
    .limit(10)//
    .reduce(0, (i, j) -> i + j);//55


Optional<Integer> reduce2 = Stream.iterate(1, i -> i + 1)//
    .limit(10)//
    .reduce((i, j) -> i + j);

collect

该操作很好理解，顾名思义就是将Stream中的元素collect到一个地方。

最常规(最不常用)的collect方法

// 最牛逼的往往是最不常用的,毕竟这个方法理解起来太过复杂了
<R> R collect(Supplier<R> supplier,
              BiConsumer<R, ? super T> accumulator,
              BiConsumer<R, R> combiner);
// 至于这个方法的参数含义，请看下面的例子

一个参数的版本

<R, A> R collect(Collector<? super T, A, R> collector);

Collector接口(他不是函数式接口，没法使用lambda)的关键代码如下：

public interface Collector<T, A, R> {
    /**
     *
     */
    Supplier<A> supplier();

    /**
     * 
     */
    BiConsumer<A, T> accumulator();

    /**
     * 
     */
    BinaryOperator<A> combiner();

    /**
     *
     */
    Function<A, R> finisher();

    /**
     * 
     */
    Set<Characteristics> characteristics();

}

先来看一个关于三个参数的collect()方法的例子，除非特殊情况，不然我保证你看了之后这辈子都不想用它……

List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);
ArrayList<Integer> ret1 = numbers.stream()//
    .map(i -> i * 2)// 扩大两倍
    .collect(//
    () -> new ArrayList<Integer>(), //参数1
    (list, e) -> list.add(e), //参数2
    (list1, list2) -> list1.addAll(list2)//参数3
);

/***
 * <pre>
 * collect()方法的三个参数解释如下：
 * 1. () -> new ArrayList<Integer>() 
 *         生成一个新的用来存储结果的集合
 * 2. (list, e) -> list.add(e)
 *         list：是参数1中生成的新集合
 *         e：是Stream中正在被迭代的当前元素
 *         该参数的作用就是将元素添加到新生成的集合中
 * 3. (list1, list2) -> list1.addAll(list2)
 *         合并集合
 * </pre>
 ***/

ret1.forEach(System.out::println);

不使用lambda的时候，等价的代码应该是这个样子的……

List<Integer> ret3 = numbers.stream()//
    .map(i -> i * 2)// 扩大两倍
    .collect(new Supplier<List<Integer>>() {
      @Override
      public List<Integer> get() {
        // 只是为了提供一个集合来存储元素
        return new ArrayList<>();
      }
    }, new BiConsumer<List<Integer>, Integer>() {
      @Override
      public void accept(List<Integer> list, Integer e) {
        // 将当前元素添加至第一个参数返回的容器中
        list.add(e);
      }
    }, new BiConsumer<List<Integer>, List<Integer>>() {

      @Override
      public void accept(List<Integer> list1, List<Integer> list2) {
        // 合并容器
        list1.addAll(list2);
      }
  });

ret3.forEach(System.out::println);

是不是被恶心到了……

同样的，用Java调用spark的api的时候，如果没有lambda的话，比上面的代码还恶心……

顺便打个免费的广告，可以看看本大侠这篇使用各种版本实现的Spark的HelloWorld: http://blog.csdn.net/hylexus/...，来证明一下有lambda的世界是有多么幸福……

不过，当你理解了三个参数的collect方法之后，可以使用构造器引用和方法引用来使代码更简洁：

ArrayList<Integer> ret2 = numbers.stream()//
    .map(i -> i * 2)// 扩大两倍
    .collect(//
    ArrayList::new, //
    List::add, //
    List::addAll//
);

ret2.forEach(System.out::println);

Collectors工具的使用(高级统计操作)

上面的三个和一个参数的collect()方法都异常复杂，最常用的还是一个参数的版本。但是那个Collector自己实现的话还是很恶心。

还好，常用的Collect操作对应的Collector都在java.util.stream.Collectors 中提供了。很强大的工具……

以下示例都是对该list的操作：

List<HouseInfo> houseInfos = Lists.newArrayList(//
    new HouseInfo(1, "恒大星级公寓", 100, 1), // 小区ID，小区名，浏览数，距离
    new HouseInfo(2, "汇智湖畔", 999, 2), //
    new HouseInfo(3, "张江汤臣豪园", 100, 1), //
    new HouseInfo(4, "保利星苑", 111, 10), //
    new HouseInfo(5, "北顾小区", 66, 23), //
    new HouseInfo(6, "北杰公寓", 77, 55), //
    new HouseInfo(7, "保利星苑", 77, 66), //
    new HouseInfo(8, "保利星苑", 111, 12)//
);

好了，开始装逼之旅 ^_^ ……

提取小区名

// 获取所有小区名，放到list中
List<String> ret1 = houseInfos.stream()
      .map(HouseInfo::getHouseName).collect(Collectors.toList());
ret1.forEach(System.out::println);

// 获取所有的小区名，放到set中去重
// 当然也可先distinct()再collect到List中
Set<String> ret2 = houseInfos.stream()
      .map(HouseInfo::getHouseName).collect(Collectors.toSet());
ret2.forEach(System.out::println);

// 将所有的小区名用_^_连接起来
// 恒大星级公寓_^_汇智湖畔_^_张江汤臣豪园_^_保利星苑_^_北顾小区_^_北杰公寓_^_保利星苑_^_保利星苑
String names = houseInfos.stream()
      .map(HouseInfo::getHouseName).collect(Collectors.joining("_^_"));
System.out.println(names);

// 指定集合类型为ArrayList
ArrayList<String> collect = houseInfos.stream()
      .map(HouseInfo::getHouseName)
      .collect(Collectors.toCollection(ArrayList::new));

最值

// 获取浏览数最高的小区
Optional<HouseInfo> ret3 = houseInfos.stream()//
  .filter(h -> h.getBrowseCount() != null)// 过滤掉浏览数为空的
  .collect(Collectors.maxBy((h1, h2) -> Integer.compare(h1.getBrowseCount(), h2.getBrowseCount())));
System.out.println(ret3.get());

// 获取最高浏览数
Optional<Integer> ret4 = houseInfos.stream()//
  .filter(h -> h.getBrowseCount() != null)// 去掉浏览数为空的
  .map(HouseInfo::getBrowseCount)// 取出浏览数
  .collect(Collectors.maxBy(Integer::compare));// 方法引用，比较浏览数
System.out.println(ret4.get());

总数、总和

// 获取总数
// 其实这个操作直接用houseInfos.size()就可以了，此处仅为演示语法
Long total = houseInfos.stream().collect(Collectors.counting());
System.out.println(total);

// 浏览数总和
Integer ret5 = houseInfos.stream()//
  .filter(h -> h.getBrowseCount() != null)// 过滤掉浏览数为空的
  .collect(Collectors.summingInt(HouseInfo::getBrowseCount));
System.out.println(ret5);

// 浏览数总和
Integer ret6 = houseInfos.stream()//
  .filter(h -> h.getBrowseCount() != null)// 过滤掉浏览数为空的
  .map(HouseInfo::getBrowseCount).collect(Collectors.summingInt(i -> i));
System.out.println(ret6);

// 浏览数总和
int ret7 = houseInfos.stream()//
  .filter(h -> h.getBrowseCount() != null)// 过滤掉浏览数为空的
  .mapToInt(HouseInfo::getBrowseCount)// 先转换为IntStream后直接用其sum()方法
  .sum();
System.out.println(ret7);

均值

// 浏览数平均值
Double ret8 = houseInfos.stream()//
  .filter(h -> h.getBrowseCount() != null)// 过滤掉浏览数为空的
  .collect(Collectors.averagingDouble(HouseInfo::getBrowseCount));
System.out.println(ret8);

// 浏览数平均值
OptionalDouble ret9 = houseInfos.stream()//
  .filter(h -> h.getBrowseCount() != null)// 过滤掉浏览数为空的
  .mapToDouble(HouseInfo::getBrowseCount)// 先转换为DoubleStream后直接用其average()方法
  .average();
System.out.println(ret9.getAsDouble());

统计信息

// 获取统计信息
DoubleSummaryStatistics statistics = houseInfos.stream()//
  .filter(h -> h.getBrowseCount() != null)
  .collect(Collectors.summarizingDouble(HouseInfo::getBrowseCount));
System.out.println("avg:" + statistics.getAverage());
System.out.println("max:" + statistics.getMax());
System.out.println("sum:" + statistics.getSum());

分组

// 按浏览数分组
Map<Integer, List<HouseInfo>> ret10 = houseInfos.stream()//
  .filter(h -> h.getBrowseCount() != null)// 过滤掉浏览数为空的
  .collect(Collectors.groupingBy(HouseInfo::getBrowseCount));
ret10.forEach((count, house) -> {
  System.out.println("BrowseCount:" + count + " " + house);
});

// 多级分组
// 先按浏览数分组,二级分组用距离分组
Map<Integer, Map<String, List<HouseInfo>>> ret11 = houseInfos.stream()//
  .filter(h -> h.getBrowseCount() != null && h.getDistance() != null)//
  .collect(Collectors.groupingBy(
      HouseInfo::getBrowseCount,
      Collectors.groupingBy((HouseInfo h) -> {
          if (h.getDistance() <= 10)
            return "较近";
          else if (h.getDistance() <= 20)
            return "近";
          return "较远";
    })));

//结果大概长这样
ret11.forEach((count, v) -> {
  System.out.println("浏览数:" + count);
  v.forEach((desc, houses) -> {
    System.out.println("\t" + desc);
    houses.forEach(h -> System.out.println("\t\t" + h));
  });
});
/****
 * <pre>
 *  浏览数:66
        较远
            HouseInfo [houseId=5, houseName=北顾小区, browseCount=66, distance=23]
    浏览数:100
        较近
            HouseInfo [houseId=1, houseName=恒大星级公寓, browseCount=100, distance=1]
            HouseInfo [houseId=3, houseName=张江汤臣豪园, browseCount=100, distance=1]
    浏览数:999
        较近
            HouseInfo [houseId=2, houseName=汇智湖畔, browseCount=999, distance=2]
    浏览数:77
        较远
            HouseInfo [houseId=6, houseName=北杰公寓, browseCount=77, distance=55]
            HouseInfo [houseId=7, houseName=保利星苑, browseCount=77, distance=66]
    浏览数:111
        近
            HouseInfo [houseId=8, houseName=保利星苑, browseCount=111, distance=12]
        较近
            HouseInfo [houseId=4, houseName=保利星苑, browseCount=111, distance=10]
 * 
 * </pre>
 * 
 ****/

分区

// 按距离分区(两部分)
Map<Boolean, List<HouseInfo>> ret12 = houseInfos.stream()//
  .filter(h -> h.getDistance() != null)//
  .collect(Collectors.partitioningBy(h -> h.getDistance() <= 20));
/****
         * <pre>
         *  较远
                    HouseInfo [houseId=5, houseName=北顾小区, browseCount=66, distance=23]
                    HouseInfo [houseId=6, houseName=北杰公寓, browseCount=77, distance=55]
                    HouseInfo [houseId=7, houseName=保利星苑, browseCount=77, distance=66]
            较近
                    HouseInfo [houseId=1, houseName=恒大星级公寓, browseCount=100, distance=1]
                    HouseInfo [houseId=2, houseName=汇智湖畔, browseCount=999, distance=2]
                    HouseInfo [houseId=3, houseName=张江汤臣豪园, browseCount=100, distance=1]
                    HouseInfo [houseId=4, houseName=保利星苑, browseCount=111, distance=10]
                    HouseInfo [houseId=8, houseName=保利星苑, browseCount=111, distance=12]
         * 
         * </pre>
         ****/
ret12.forEach((t, houses) -> {
  System.out.println(t ? "较近" : "较远");
  houses.forEach(h -> System.out.println("\t\t" + h));
});


Map<Boolean, Map<Boolean, List<HouseInfo>>> ret13 = houseInfos.stream()//
  .filter(h -> h.getDistance() != null)//
  .collect(
          Collectors.partitioningBy(h -> h.getDistance() <= 20,
        Collectors.partitioningBy(h -> h.getBrowseCount() >= 70))
);

/*****
         * <pre>
         *  较远
                浏览较少
                    HouseInfo [houseId=5, houseName=北顾小区, browseCount=66, distance=23]
                浏览较多
                    HouseInfo [houseId=6, houseName=北杰公寓, browseCount=77, distance=55]
                    HouseInfo [houseId=7, houseName=保利星苑, browseCount=77, distance=66]
            较近
                浏览较少
                浏览较多
                    HouseInfo [houseId=1, houseName=恒大星级公寓, browseCount=100, distance=1]
                    HouseInfo [houseId=2, houseName=汇智湖畔, browseCount=999, distance=2]
                    HouseInfo [houseId=3, houseName=张江汤臣豪园, browseCount=100, distance=1]
                    HouseInfo [houseId=4, houseName=保利星苑, browseCount=111, distance=10]
                    HouseInfo [houseId=8, houseName=保利星苑, browseCount=111, distance=12]
         * </pre>
         ****/

ret13.forEach((less, value) -> {
  System.out.println(less ? "较近" : "较远");
  value.forEach((moreCount, houses) -> {
    System.out.println(moreCount ? "\t浏览较多" : "\t浏览较少");
    houses.forEach(h -> System.out.println("\t\t" + h));
  });
});