Weird Stream API

Post on 13-Jan-2017

253 views 1 download

Transcript of Weird Stream API

Weird Stream API

Тагир ВалеевJetBrains

2

Почему мне можно верить?https://github.com/amaembo/streamex

3

Почему мне можно верить?• JDK-8072727 Add variation of Stream.iterate() that's finite• JDK-8136686 Collectors.counting can use Collectors.summingLong to reduce boxing• JDK-8141630 Specification of Collections.synchronized* need to state traversal constraints• JDK-8145007 Pattern splitAsStream is not late binding as required by the specification• JDK-8146218 Add LocalDate.datesUntil method producing Stream<LocalDate>• JDK-8147505 BaseStream.onClose() should not allow registering new handlers after stream is consumed• JDK-8148115 Stream.findFirst for unordered source optimization• JDK-8148250 Stream.limit() parallel tasks with ordered non-SUBSIZED source should short-circuit• JDK-8148838 Stream.flatMap(...).spliterator() cannot properly split after tryAdvance()• JDK-8148748 ArrayList.subList().spliterator() is not late-binding• JDK-8151123 Collectors.summingDouble/averagingDouble unnecessarily call mapper twice• JDK-8153293 Preserve SORTED and DISTINCT characteristics for boxed() and asLongStream() operations• JDK-8154387 Parallel unordered Stream.limit() tries to collect 128 elements even if limit is less• JDK-8155600 Performance optimization of Arrays.asList().iterator()• JDK-8164189 Collectors.toSet() parallel performance improvement

4

5

6

source.stream()

7

.map(x -> x.squash())

8

.filter(x -> x.getColor() != YELLOW)

9

.forEach(System.out::println)

10

source.stream() .map(x -> x.squash()) .filter(x -> x.getColor() != YELLOW) .forEach(System.out::println)

11

AtomicLong cnt = new AtomicLong();

.peek(x -> cnt.incrementAndGet())

12

AtomicLong cnt = new AtomicLong();

source.stream() .map(x -> x.squash()) .peek(x -> cnt.incrementAndGet()) .filter(x -> x.getColor() != YELLOW) .forEach(System.out::println)

13

LongStream.range(1, 100) .count();

14

LongStream.range(1, 100) .count();>> 99

15

LongStream.range(0, 1_000_000_000_000_000_000L) .count();

16

LongStream.range(0, 1_000_000_000_000_000_000L) .count();

17

LongStream.range(0, 1_000_000_000_000_000_000L) .count();>> 1000000000000000000

Java 9:JDK-8067969 Optimize Stream.count for SIZED Streams

18

public interface Spliterator<T> { boolean tryAdvance(Consumer<? super T> action); default void forEachRemaining(Consumer<? super T> action) { … }

Spliterator<T> trySplit();

long estimateSize(); default long getExactSizeIfKnown() { … }

int characteristics(); default boolean hasCharacteristics(int characteristics) { … }

default Comparator<? super T> getComparator() { … } …}

19

Характеристики

SIZEDSUBSIZEDSORTEDORDERED

DISTINCTNONNULLIMMUTABLECONCURRENT

20

list.stream() .peek(data -> process(data)) .peek(data -> processInAnotherWay(data)) .count(); // какая-нибудь терминальная операция

21

Характеристики

SIZEDSUBSIZEDSORTEDORDERED

DISTINCTNONNULLIMMUTABLECONCURRENT

22

Характеристики

SIZEDSUBSIZEDSORTEDORDERED

DISTINCTNONNULLIMMUTABLECONCURRENT

23

Stream<T>IntStreamLongStreamDoubleStream

24

Stream<T>IntStream Stream<Integer>

LongStream Stream<Long>DoubleStream Stream<Double>

25

Primitive is faster!

26

int[] ints;Integer[] integers;

@Setuppublic void setup() { ints = new Random(1).ints(1000000, 0, 1000) .toArray(); integers = new Random(1).ints(1000000, 0, 1000) .boxed().toArray(Integer[]::new);}

27

@Benchmarkpublic long stream() { return Stream.of(integers).distinct().count();}

@Benchmarkpublic long intStream() { return IntStream.of(ints).distinct().count();}

28

@Benchmarkpublic long stream() { return Stream.of(integers).distinct().count();}

@Benchmarkpublic long intStream() { return IntStream.of(ints).distinct().count();}

# VM version: JDK 1.8.0_92, VM 25.92-b14Benchmark Mode Cnt Score Error UnitsintStream avgt 30 17.121 ± 0.296 ms/opstream avgt 30 15.764 ± 0.116 ms/op

29

Stream<T> ⇒ ReferencePipelineIntStream ⇒ IntPipelineLongStream ⇒ LongPipelineDoubleStream ⇒ DoublePipeline

AbstractPipeline

package java.util.stream;

30

// java.util.stream.IntPipeline

@Overridepublic final IntStream distinct() { // While functional and quick to implement, // this approach is not very efficient. // An efficient version requires // an int-specific map/set implementation. return boxed().distinct().mapToInt(i -> i);}

31

@Benchmarkpublic long stream() { return Stream.of(integers).distinct().count();}

@Benchmarkpublic long intStream() { // IntStream.of(ints).distinct().count(); return IntStream.of(ints).boxed().distinct() .mapToInt(i -> i).count();}

-prof gcstream 48870 B/opintStream 13642536 B/op

32

Задача: вычислить сумму 5 различных псевдослучайных чисел от 1 до 20.

Image courtesy: http://magenta-stock.deviantart.com/

33

Задача: вычислить сумму 5 различных псевдослучайных чисел от 1 до 20.

// IntStream ints(long streamSize, int origin, int bound)new Random().ints(5, 1, 20+1).sum();

34

Задача: вычислить сумму 5 различных псевдослучайных чисел от 1 до 20.

// IntStream ints(long streamSize, int origin, int bound)new Random().ints(5, 1, 20+1).sum();

35

Задача: вычислить сумму 5 различных псевдослучайных чисел от 1 до 20.

// IntStream ints(long streamSize, int origin, int bound)new Random().ints(5, 1, 20+1).sum();

new Random().ints(5, 1, 20+1).distinct().sum();

36

Задача: вычислить сумму 5 различных псевдослучайных чисел от 1 до 20.

// IntStream ints(long streamSize, int origin, int bound)new Random().ints(5, 1, 20+1).sum();

new Random().ints(5, 1, 20+1).distinct().sum();

// IntStream ints(int origin, int bound)new Random().ints(1, 20+1).distinct().limit(5).sum();

37

Задача: вычислить сумму 5 различных псевдослучайных чисел от 1 до 20.

new Random().ints(1, 20+1).distinct().limit(5).sum();

new Random().ints(1, 20+1).parallel().distinct().limit(5).sum();

new Random().ints(1, 20+1).distinct().parallel().limit(5).sum();

new Random().ints(1, 20+1).distinct().limit(5).parallel().sum();

38

Задача: вычислить сумму 5 различных псевдослучайных чисел от 1 до 20.

new Random().ints(1, 20+1).distinct().limit(5).sum();

new Random().ints(1, 20+1).parallel().distinct().limit(5).sum();

new Random().ints(1, 20+1).distinct().parallel().limit(5).sum();

new Random().ints(1, 20+1).distinct().limit(5).parallel().sum();

new Random().ints(1, 20+1).parallel().distinct().limit(5) .sequential().sum();

JDK-8132800 clarify stream package documentation regarding sequential vs parallel modes

39

Задача: вычислить сумму 5 различных псевдослучайных чисел от 1 до 20.

new Random().ints(1, 20+1).distinct().limit(5).sum();

# VM version: JDK 1.8.0_92, VM 25.92-b14Benchmark Mode Cnt Score Error UnitsrndSum avgt 30 0.286 ± 0.001 us/op

40

Задача: вычислить сумму 5 различных псевдослучайных чисел от 1 до 20.

new Random().ints(1, 20+1).distinct().limit(5).sum();

# VM version: JDK 1.8.0_92, VM 25.92-b14Benchmark Mode Cnt Score Error UnitsrndSum avgt 30 0.286 ± 0.001 us/oprndSumPar ~6080 years/op

41

Задача: вычислить сумму 5 различных псевдослучайных чисел от 1 до 20.

new Random().ints(1, 20+1).distinct().limit(5).sum();

# VM version: JDK 1.8.0_92, VM 25.92-b14Benchmark Mode Cnt Score Error UnitsrndSum avgt 30 0.286 ± 0.001 us/oprndSumPar ~6080 years/op

java.util.stream.StreamSpliterators.UnorderedSliceSpliterator<T, T_SPLITR>

42

Задача: вычислить сумму 5 различных псевдослучайных чисел от 1 до 20.

public IntStream ints(int randomNumberOrigin, int randomNumberBound) Returns an effectively unlimited stream of pseudorandom int values, each conformingto the given origin (inclusive) and bound (exclusive).

Implementation Note:This method is implemented to be equivalent to ints(Long.MAX_VALUE, randomNumberOrigin, randomNumberBound).

43

Задача: вычислить сумму 5 различных псевдослучайных чисел от 1 до 20.

new Random().ints(1, 20+1).distinct().limit(5).sum();new Random().ints(1, 20+1).parallel().distinct().limit(5).sum();

# VM version: JDK 9-ea, VM 9-ea+130Benchmark Mode Cnt Score Error UnitsrndSum avgt 30 0.318 ± 0.003 us/oprndSumPar avgt 30 7.793 ± 0.026 us/op

JDK-8154387 Parallel unordered Stream.limit() tries to collect 128 elements even if limit is less

44

parallel().skip()IntStream.range(0, 100_000_000) 104.5±4.4 ms .skip(99_000_000) .sum();

IntStream.range(0, 100_000_000) ? .parallel() .skip(99_000_000) .sum();

45

46

parallel().skip()API Note:While skip() is generally a cheap operation on sequential stream pipelines, it can be quite expensive on ordered parallel pipelines, especially for large values of n, since skip(n) is constrained to skip not just any n elements, but the first n elements in the encounter order.

47

parallel().skip()IntStream.range(0, 100_000_000) 104.5±4.4 ms .skip(99_000_000) .sum();

IntStream.range(0, 100_000_000) 1.4±0.2 ms .parallel() (74.6×) .skip(99_000_000) .sum();

48

parallel(): trySplit() (SIZED+SUBSIZED)

0..99_999_999

0..49_999_999 50_000_000..99_999_999

75_000_000..99_999_999

50_000_000..74_999_999

…………

25_000_000..49_999_999

……

0..24_999_999

……

49

parallel().skip()(SIZED+SUBSIZED)

0..99_999_999

0..49_999_999 50_000_000..99_999_999

75_000_000..99_999_99950_000_000..74_999_999

……

50

class Node { private String name; private List<Node> children;

public Node(String name, Node... children) { this.name = name; this.children = Arrays.asList(children); }

@Override public String toString() { return name; }

public Stream<Node> allNodes() { return …? }}

51

class Node { private String name; private List<Node> children;

...

@Override public String toString() { return name; }

public Stream<Node> allNodes() { return children.isEmpty() ? Stream.of(this) : Stream.concat( Stream.of(this), children.stream().flatMap(Node::allNodes)); }}

52

flatMap()Stream<Integer> a = ...;Stream<Integer> b = ...;Stream<Integer> c = ...;Stream<Integer> d = ...;Stream<Integer> res = Stream.concat( Stream.concat(Stream.concat(a, b), c), d);

Stream<Integer> res = Stream.of(a, b, c, d) .flatMap(Function.identity());

53

flatMap() – декартово произведениеList<List<String>> input = asList(asList("a", "b", "c"), asList("x", "y"), asList("1", "2", "3"));

Supplier<Stream<String>> s = input.stream() .<Supplier<Stream<String>>>map(list -> list::stream) .reduce((sup1, sup2) -> () -> sup1.get() .flatMap(e1 -> sup2.get().map(e2 -> e1+e2))) .orElse(() -> Stream.of(""));

s.get().forEach(System.out::println);>> ax1>> ax2>> …>> cy3

54

flatMap() parallel?Stream.of(list1, list2).parallel() .flatMap(List::stream).forEach(item -> {...});

list1

Item1Item2Item3

…Item1000

list2

Item1Item2Item3

…Item1000

55

flatMap() parallel?Stream.of(list1, list2).parallel() .flatMap(l -> l.stream().parallel()).forEach(item -> {...});

list1

Item1Item2Item3

…Item1000

list2

Item1Item2Item3

…Item1000

56

concat()-то лучше!Stream.concat(list1.stream().parallel(), list2.stream().parallel()).forEach(item -> {});

list1

Item1Item2

…Item500

Item501Item502

…Item1000

list2

Item1Item2

…Item500

Item501Item502

…Item1000

57

flatMap() и short-circuitingprivate IntStream stream() {

if (!flatMap) {

return IntStream.rangeClosed(0, 1_000_000_000);

} else {

return IntStream.of(1_000_000_000)

.flatMap(x -> IntStream.rangeClosed(0, x));

}

}

58

flatMap() и short-circuiting@Benchmarkpublic boolean hasGreaterThan2() { return stream().anyMatch(x -> x > 2);}

59

flatMap() и short-circuiting@Benchmarkpublic boolean hasGreaterThan2() { return stream().anyMatch(x -> x > 2);}

# VM version: JDK 1.8.0_101, VM 25.101-b13Benchmark (flatMap) Mode Score Error UnitshasGreaterThan2 false avgt 0.040 ± 0.001 μs/ophasGreaterThan2 true avgt 980489.972 ± 12335.916 μs/op

60

flatMap() и tryAdvance()flatMap = false;Spliterator<Integer> spliterator = stream().spliterator();spliterator.tryAdvance(System.out::println);>> 0

61

flatMap() и tryAdvance()flatMap = true;Spliterator<Integer> spliterator = stream().spliterator();spliterator.tryAdvance(System.out::println);

62

java.lang.OutOfMemoryError: Java heap space at java.util.stream.SpinedBuffer.ensureCapacity at java.util.stream.SpinedBuffer.increaseCapacity at java.util.stream.SpinedBuffer.accept at java.util.stream.IntPipeline$4$1.accept at java.util.stream.IntPipeline$7$1.lambda$accept$198 at java.util.stream.IntPipeline$7$1$$Lambda$7/1831932724.accept at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining at java.util.stream.IntPipeline$Head.forEach at java.util.stream.IntPipeline$7$1.accept at java.util.stream.Streams$IntStreamBuilderImpl.tryAdvance at java.util.Spliterator$OfInt.tryAdvance ... at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance at by.jetconf.streamsamples.FlatMapTryAdvance.main

63

flatMap() и tryAdvance()flatMap = true;Spliterator<Integer> spliterator = stream().spliterator();spliterator.tryAdvance(System.out::println);

64

flatMap() и concat()flatMap = true;

stream().sum();

stream().findFirst(); // non-short-circuit

IntStream.concat(stream(), IntStream.of(1)).sum();

IntStream.concat(stream(), IntStream.of(1)).findFirst();

65

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.stream.SpinedBuffer$OfInt.newArray at java.util.stream.SpinedBuffer$OfInt.newArray at java.util.stream.SpinedBuffer$OfPrimitive.ensureCapacity at java.util.stream.SpinedBuffer$OfPrimitive.increaseCapacity at java.util.stream.SpinedBuffer$OfPrimitive.preAccept at java.util.stream.SpinedBuffer$OfInt.accept ... at java.util.stream.Streams$ConcatSpliterator$OfInt.tryAdvance at java.util.stream.IntPipeline.forEachWithCancel at java.util.stream.AbstractPipeline.copyIntoWithCancel at java.util.stream.AbstractPipeline.copyInto at java.util.stream.AbstractPipeline.wrapAndCopyInto at java.util.stream.FindOps$FindOp.evaluateSequential at java.util.stream.AbstractPipeline.evaluate at java.util.stream.IntPipeline.findFirst at by.jetconf.streamsamples.ConcatFlat.main

66

class Node { private String name; private List<Node> children;

...

@Override public String toString() { return name; }

public Stream<Node> allNodes() { return children.isEmpty() ? Stream.of(this) : Stream.concat( Stream.of(this), children.stream().flatMap(Node::allNodes)); }}

67

Root

Child

n0 n1 n2 n3 n999999…

68

root.allNodes().count()

root.allNodes().anyMatch(n -> false)

69

count 29.4 ± 2.0 ms 80 001 KB/op

anyMatch 70.0 ± 6.0 ms 84 196 KB/op

70

// with concatpublic Stream<Node> allNodes() { return children.isEmpty() ? Stream.of(this) : Stream.concat( Stream.of(this), children.stream().flatMap(Node::allNodes));}

// without concatpublic Stream<Node> allNodes() { return children.isEmpty() ? Stream.of(this) : Stream.of(Stream.of(this), children.stream().flatMap(Node::allNodes)) .flatMap(Function.identity());}

71

count (concat) 29.4 ± 2.0 ms 80 001 KB/op (flatMap) 32.6 ± 2.0 ms 80 001 KB/op

anyMatch (concat) 70.0 ± 6.0 ms 84 196 KB/op (flatMap) 31.0 ± 2.3 ms 80 001 KB/op

72

// with concatpublic Stream<Node> allNodes() { return children.isEmpty() ? Stream.of(this) : Stream.concat( Stream.of(this), children.stream().flatMap(Node::allNodes));}

// without concatpublic Stream<Node> allNodes() { return children.isEmpty() ? Stream.of(this) : Stream.of(Stream.of(this), children.stream().flatMap(Node::allNodes)) .flatMap(Function.identity());}

73

import one.util.streamex.StreamEx;

public Stream<Node> allNodes() { return StreamEx.ofTree(this, n -> n.children.isEmpty() ? null : StreamEx.of(n.children));}

74

count (concat) 29.4 ± 2.0 ms 80 001 KB/op (flatMap) 32.6 ± 2.0 ms 80 001 KB/op (StreamEx) 12.4 ± 0.2 ms 0.5 KB/op

anyMatch (concat) 70.0 ± 6.0 ms 84 196 KB/op (flatMap) 31.0 ± 2.3 ms 80 001 KB/op (StreamEx) 19.3 ± 0.3 ms 0.5 KB/op

Stream и Iterator

76

iterator()

Верните цикл for()!

77

for vs forEach()for forEach()

Появился Java 1.5 Java 1.8

Нормальная отладка

Короткие стектрейсы

Checked exceptions

Изменяемые переменные

Досрочный выход по break

Параллелизм (всё равно не нужен)

78

iterator()

79

iterator()Stream<String> s = Stream.of("a", "b", "c");for(String str : (Iterable<String>)s::iterator) { System.out.println(str);}

80

iterator()Stream<String> s = Stream.of("a", "b", "c");for(String str : (Iterable<String>)s::iterator) { System.out.println(str);}

// Joshua Bloch approvesfor(String str : StreamEx.of("a", "b", "c")) { System.out.println(str);}

81

iterator()

IntStream s = IntStream.of(1_000_000_000) .flatMap(x -> IntStream.range(0, x));for(int i : (Iterable<Integer>)s::iterator) { System.out.println(i);}

82

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at j.u.s.SpinedBuffer$OfInt.newArray at j.u.s.SpinedBuffer$OfInt.newArray at j.u.s.SpinedBuffer$OfPrimitive.ensureCapacity at j.u.s.SpinedBuffer$OfPrimitive.increaseCapacity at j.u.s.SpinedBuffer$OfPrimitive.preAccept at j.u.s.SpinedBuffer$OfInt.accept at j.u.s.IntPipeline$7$1.lambda$accept$198 ... at j.u.s.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer at j.u.s.StreamSpliterators$AbstractWrappingSpliterator.doAdvance at j.u.s.StreamSpliterators$IntWrappingSpliterator.tryAdvance at j.u.Spliterators$2Adapter.hasNext at by.jetconf.streamsamples.IterableWorkaround.main

83

Map<String, String> userPasswords = Files.lines(Paths.get("/etc/passwd")) .map(str -> str.split(":")) .collect(toMap(arr -> arr[0], arr -> arr[1]));

84

Map<String, String> userPasswords = Files.lines(Paths.get("/etc/passwd")) .map(str -> str.split(":")) .collect(toMap(arr -> arr[0], arr -> arr[1]));

such stream

WOWmuch fluent

85

Map<String, String> userPasswords;try (Stream<String> stream = Files.lines(Paths.get("/etc/passwd"))) { userPasswords = stream.map(str -> str.split(":")) .collect(toMap(arr -> arr[0], arr -> arr[1]));}

86

Map<String, String> userPasswords;try (Stream<String> stream = Files.lines(Paths.get("/etc/passwd"))) { userPasswords = stream.map(str -> str.split(":")) .collect(toMap(arr -> arr[0], arr -> arr[1]));}

87

http://stackoverflow.com/questions/34753078/

88

Терминальные операции

Нормальные (с внутренним обходом):

forEach()collect()reduce()

findFirst()count()toArray()

89

Терминальные операции

Нормальные (с внутренним обходом):

forEach()collect()reduce()

findFirst()count()toArray()

Причудливые(с внешним обходом):

iterator()spliterator()

90

Use flatMapto save the Stream!

91

flatMap<R> Stream<R> flatMap( Function<? super T,? extends Stream<? extends R>> mapper)

Returns a stream consisting of the results of replacing each element of this stream with the contents of a mapped stream produced by applying the provided mapping function to each element. Each mapped stream is closed after its contents have been placed into this stream. (If a mapped stream is null an empty stream is used, instead.)

92

Map<String, String> userPasswords = Stream.of(Files.lines(Paths.get("/etc/passwd"))) .flatMap(Function.identity()) .map(str -> str.split(":")) .collect(toMap(arr -> arr[0], arr -> arr[1]));

93

Files.lines(Paths.get("/etc/passwd")) .spliterator().tryAdvance(...);// вычитываем одну строку, файл не закрываем

Stream.of(Files.lines(Paths.get("/etc/passwd"))) .flatMap(s -> s) .spliterator().tryAdvance(...);// вычитываем весь файл в память, файл закрываем

94

Files.lines(Paths.get("/etc/passwd")) .spliterator().tryAdvance(...);// вычитываем одну строку, файл не закрываем

Stream.of(Files.lines(Paths.get("/etc/passwd"))) .flatMap(s -> s) .spliterator().tryAdvance(...);// вычитываем весь файл в память, файл закрываем

http://stackoverflow.com/a/32767282/4856258 (небуферизующий flatMap)

95

There’s no way to save the Stream

96

Спасибо за внимание

https://twitter.com/tagir_valeev

https://github.com/amaembo

https://habrahabr.ru/users/lany