Java 21 JMH benchmarks for field access, counters, collections, sequential and parallel streams, etc.
Open Result Charts in JMH Visualizer.
Alternatively, download Result JSON.
The following test environment was used to create the results above.
- CPU: Intel Core i7-6700K at 4.00GHz with 4 cores (8 threads)
- Motherboard: Asus Z170-A
- RAM: 32 GiB DDR4 (2 x Corsair 16 GiB, 2133 MT/s)
- Virtualization: None; bare metal desktop
- Java: Amazon Corretto 21.0.3.9.1
- OS: Ubuntu 22.04.4 LTS
- Kernel: 5.15.86-051586-generic
- JVM args:
-Xms4g -Xmx4g -Xlog:gc=info:stdout
Run all benchmarks:
./benchmark.sh
Run specific benchmark(s):
./benchmark.sh --includes <REGEX>
Examples:
./benchmark.sh --includes Time.*
./benchmark.sh -i Time
./benchmark.sh -i nano
After completion, you find the results in ./jmh-result-all.json
.
Note
Unless stated otherwise, benchmark throughput measurements are for a single operation, e.g. a single addition to a collection or a single iterator advancement.
AtomicInteger vs MutableInt vs int, same for the corresponding long types.
Comparing various ways of getting and setting object fields.
Ordered by performance from top to bottom, the ranking is:
- Direct call
- LambdaMetaFactory - almost as fast as direct, but requires at least a private accessor method
- Reflection - ca. 30% of the direct performance
- MethodHandle and VarHandle - ca. 20% of the direct performance
Compares adding elements to int/Integer/long/Long arrays as well as empty collections and maps.
Iterating over all elements of pre-populated collections and maps.
Concurrent get (10 threads), add (2 threads) and remove (1 thread) of Integer elements for a number of thread-safe collection classes. The non thread-safe ArrayList class is included in this benchmark and gets protected by wrapping it via Collections.synchronizedList()
.
Each data structure gets populated before he benchmark. Access occurs for the head of the data structure (where the concept of head is supported), otherwise (such as in the instance of maps) by key.
Compares streaming over primitive and wrapper classes compared with using a for loop. The stream collects filtered elements into a target data structure. This benchmarks also compares single threaded with parallel streaming over data structures of varying length.
Note
In contrast to the other benchmarks, the measurements here are for processing the entire stream. The benchmark is run repeatedly for increasing stream lengths, from 1 to 10 million in "one order of magnitude" increments. Thus, as the stream length increases, the measured throughput decreases.
Compares the use of ints, custom int wrappers instantiation, and a custom int wrapper cache.
Compares System.currentTimeMillis
, System.nanoTime
, and various java.time
classes.