An experiment to assess the performance gains from bitwise operations in Java.
Methodology: use the junit benchmark framework to measure the time elapsed with and without bit-twiddling for four operations (each executed several million times): multiply, divide , modulo and finding the next power of two.
Configuration: JDK 7 running on MacOS 10.7.
Results
For each test two results are shown: time taken without bit-twiddling first, with bit-twiddling second.
1) Division
DivideBenchmark.testJVMDivision: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.04 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.39, time.warmup: 0.07, time.bench: 0.32
DivideBenchmark.testBitwiseDivision: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.03 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.33, time.warmup: 0.04, time.bench: 0.29
2) Multiplication
MultBenchmark.testJVMMultiply: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.02 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.22, time.warmup: 0.04, time.bench: 0.18
MultBenchmark.testBitwiseMultiply: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.02 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.18, time.warmup: 0.03, time.bench: 0.16
3) Modulo
ModuloBenchmark.testJVMModulo: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.09 [+- 0.03], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 1.00, time.warmup: 0.20, time.bench: 0.80
ModuloBenchmark.testBitwiseModulo: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.09 [+- 0.01], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.95, time.warmup: 0.10, time.bench: 0.84
4) Find the next power of 2
NextPowerOfTwoBenchmark.testJVMNextPowerOfTwo: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 1.44 [+- 0.02], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 14.49, time.warmup: 1.50, time.bench: 13.00
NextPowerOfTwoBenchmark.testBitwisePowerOfTwo: [measured 9 out of 10 rounds, threads: 1 (sequential)]
round: 0.03 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.33, time.warmup: 0.05, time.bench: 0.28
Conclusion
There’s no discernable performance gains when using bit-twiddling to execute multiplications, divisions or modulo. This is hardly surprising as these simple utilities should already be heavily optimised by the JVM.
Less heavily JVM-optimised code (eg. find the next power of two) shows large potential performance wins, which may even justify the less readable code associated with bit shifts techniques.
Benchmark source code
| 12345678910111213141516171819202122232425262728293031323334353637383940 |
|
| 123456789101112131415161718192021222324252627282930313233343536373839 |
|
| 123456789101112131415161718192021222324252627282930313233343536373839 |
|
| 123456789101112131415161718192021222324252627282930313233343536373839404142 |
|
No comments:
Post a Comment