An experiment to assess the performance gains from bitwise operations in Java.
Methodology: use the junit benchmark framework to measure the time elapsed with and without bit-twiddling for four operations (each executed several million times): multiply, divide , modulo and finding the next power of two.
Configuration: JDK 7 running on MacOS 10.7.
Results
For each test two results are shown: time taken without bit-twiddling first, with bit-twiddling second.
1) Division
DivideBenchmark.testJVMDivision: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.04 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.39, time.warmup: 0.07, time.bench: 0.32
DivideBenchmark.testBitwiseDivision: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.03 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.33, time.warmup: 0.04, time.bench: 0.29
2) Multiplication
MultBenchmark.testJVMMultiply: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.02 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.22, time.warmup: 0.04, time.bench: 0.18
MultBenchmark.testBitwiseMultiply: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.02 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.18, time.warmup: 0.03, time.bench: 0.16
3) Modulo
ModuloBenchmark.testJVMModulo: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.09 [+- 0.03], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 1.00, time.warmup: 0.20, time.bench: 0.80
ModuloBenchmark.testBitwiseModulo: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.09 [+- 0.01], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.95, time.warmup: 0.10, time.bench: 0.84
4) Find the next power of 2
NextPowerOfTwoBenchmark.testJVMNextPowerOfTwo: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 1.44 [+- 0.02], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 14.49, time.warmup: 1.50, time.bench: 13.00
NextPowerOfTwoBenchmark.testBitwisePowerOfTwo: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.03 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.33, time.warmup: 0.05, time.bench: 0.28
Conclusion
There’s no discernable performance gains when using bit-twiddling to execute multiplications, divisions or modulo. This is hardly surprising as these simple utilities should already be heavily optimised by the JVM.
Less heavily JVM-optimised code (eg. find the next power of two) shows large potential performance wins, which may even justify the less readable code associated with bit shifts techniques.
Benchmark source code
12345678910111213141516171819202122232425262728293031323334353637383940 |
|
123456789101112131415161718192021222324252627282930313233343536373839 |
|
123456789101112131415161718192021222324252627282930313233343536373839 |
|
123456789101112131415161718192021222324252627282930313233343536373839404142 |
|
No comments:
Post a Comment