Be extremely careful using any of the other suggestions. It all depends on context.
I have spent a long time tracing a bugs in a system that presumed a==b
if |a-b|<epsilon
. The underlying problems were:
The implicit presumption in an algorithm that if a==b
and b==c
then a==c
.
Using the same epsilon for lines measured in inches and lines measured in mils (.001 inch). That is a==b
but 1000a!=1000b
. (This is why AlmostEqual2sComplement asks for the epsilon or max ULPS).
The use of the same epsilon for both the cosine of angles and the length of lines!
Using such a compare function to sort items in a collection. (In this case using the builtin C++ operator == for doubles produced correct results.)
Like I said: it all depends on context and the expected size of a
and b
.
BTW, std::numeric_limits<double>::epsilon()
is the "machine epsilon". It is the difference between 1.0 and the next value representable by a double. I guess that it could be used in the compare function but only if the expected values are less than 1. (This is in response to @cdv's answer...)
Also, if you basically have int
arithmetic in doubles
(here we use doubles to hold int values in certain cases) your arithmetic will be correct. For example 4.0/2.0 will be the same as 1.0+1.0. This is as long as you do not do things that result in fractions (4.0/3.0) or do not go outside of the size of an int.
Some test results
I've gotten a lot of good answers to this question--thanks folks--so I decided to run some tests and figure out which method is actually fastest. The five methods I tested are these:
- the "ContainsKey" method that I presented in the question
- the "TestForNull" method suggested by Aleksandar Dimitrov
- the "AtomicLong" method suggested by Hank Gay
- the "Trove" method suggested by jrudolph
- the "MutableInt" method suggested by phax.myopenid.com
Method
Here's what I did...
- created five classes that were identical except for the differences shown below. Each class had to perform an operation typical of the scenario I presented: opening a 10MB file and reading it in, then performing a frequency count of all the word tokens in the file. Since this took an average of only 3 seconds, I had it perform the frequency count (not the I/O) 10 times.
- timed the loop of 10 iterations but not the I/O operation and recorded the total time taken (in clock seconds) essentially using Ian Darwin's method in the Java Cookbook.
- performed all five tests in series, and then did this another three times.
- averaged the four results for each method.
Results
I'll present the results first and the code below for those who are interested.
The ContainsKey method was, as expected, the slowest, so I'll give the speed of each method in comparison to the speed of that method.
- ContainsKey: 30.654 seconds (baseline)
- AtomicLong: 29.780 seconds (1.03 times as fast)
- TestForNull: 28.804 seconds (1.06 times as fast)
- Trove: 26.313 seconds (1.16 times as fast)
- MutableInt: 25.747 seconds (1.19 times as fast)
Conclusions
It would appear that only the MutableInt method and the Trove method are significantly faster, in that only they give a performance boost of more than 10%. However, if threading is an issue, AtomicLong might be more attractive than the others (I'm not really sure). I also ran TestForNull with final
variables, but the difference was negligible.
Note that I haven't profiled memory usage in the different scenarios. I'd be happy to hear from anybody who has good insights into how the MutableInt and Trove methods would be likely to affect memory usage.
Personally, I find the MutableInt method the most attractive, since it doesn't require loading any third-party classes. So unless I discover problems with it, that's the way I'm most likely to go.
The code
Here is the crucial code from each method.
ContainsKey
import java.util.HashMap;
import java.util.Map;
...
Map<String, Integer> freq = new HashMap<String, Integer>();
...
int count = freq.containsKey(word) ? freq.get(word) : 0;
freq.put(word, count + 1);
TestForNull
import java.util.HashMap;
import java.util.Map;
...
Map<String, Integer> freq = new HashMap<String, Integer>();
...
Integer count = freq.get(word);
if (count == null) {
freq.put(word, 1);
}
else {
freq.put(word, count + 1);
}
AtomicLong
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.atomic.AtomicLong;
...
final ConcurrentMap<String, AtomicLong> map =
new ConcurrentHashMap<String, AtomicLong>();
...
map.putIfAbsent(word, new AtomicLong(0));
map.get(word).incrementAndGet();
Trove
import gnu.trove.TObjectIntHashMap;
...
TObjectIntHashMap<String> freq = new TObjectIntHashMap<String>();
...
freq.adjustOrPutValue(word, 1, 1);
MutableInt
import java.util.HashMap;
import java.util.Map;
...
class MutableInt {
int value = 1; // note that we start at 1 since we're counting
public void increment () { ++value; }
public int get () { return value; }
}
...
Map<String, MutableInt> freq = new HashMap<String, MutableInt>();
...
MutableInt count = freq.get(word);
if (count == null) {
freq.put(word, new MutableInt());
}
else {
count.increment();
}
Best Answer
GCC has:
I'd expect them to be translated into something reasonably efficient for your current platform, whether it be one of those fancy bit-twiddling algorithms, or a single instruction.
A useful trick if your input can be zero is
__builtin_clz(x | 1)
: unconditionally setting the low bit without modifying any others makes the output31
forx=0
, without changing the output for any other input.To avoid needing to do that, your other option is platform-specific intrinsics like ARM GCC's
__clz
(no header needed), or x86's_lzcnt_u32
on CPUs that support thelzcnt
instruction. (Beware thatlzcnt
decodes asbsr
on older CPUs instead of faulting, which gives 31-lzcnt for non-zero inputs.)There's unfortunately no way to portably take advantage of the various CLZ instructions on non-x86 platforms that do define the result for input=0 as 32 or 64 (according to the operand width). x86's
lzcnt
does that, too, whilebsr
produces a bit-index that the compiler has to flip unless you use31-__builtin_clz(x)
.(The "undefined result" is not C Undefined Behavior, just a value that isn't defined. It's actually whatever was in the destination register when the instruction ran. AMD documents this, Intel doesn't, but Intel's CPUs do implement that behaviour. But it's not whatever was previously in the C variable you're assigning to, that's not usually how things work when gcc turns C into asm. See also Why does breaking the "output dependency" of LZCNT matter?)