I've long been under the impression that goto
should never be used if possible. While perusing libavcodec (which is written in C) the other day, I noticed multiple uses of it. Is it ever advantageous to use goto
in a language that supports loops and functions? If so, why?
Is it ever advantageous to use ‘goto’ in a language that supports loops and functions? If so, why
exceptiongotolanguage-agnostic
Related Solutions
Usually a simple hash function works by taking the "component parts" of the input (characters in the case of a string), and multiplying them by the powers of some constant, and adding them together in some integer type. So for example a typical (although not especially good) hash of a string might be:
(first char) + k * (second char) + k^2 * (third char) + ...
Then if a bunch of strings all having the same first char are fed in, then the results will all be the same modulo k, at least until the integer type overflows.
[As an example, Java's string hashCode is eerily similar to this - it does the characters reverse order, with k=31. So you get striking relationships modulo 31 between strings that end the same way, and striking relationships modulo 2^32 between strings that are the same except near the end. This doesn't seriously mess up hashtable behaviour.]
A hashtable works by taking the modulus of the hash over the number of buckets.
It's important in a hashtable not to produce collisions for likely cases, since collisions reduce the efficiency of the hashtable.
Now, suppose someone puts a whole bunch of values into a hashtable that have some relationship between the items, like all having the same first character. This is a fairly predictable usage pattern, I'd say, so we don't want it to produce too many collisions.
It turns out that "because of the nature of maths", if the constant used in the hash, and the number of buckets, are coprime, then collisions are minimised in some common cases. If they are not coprime, then there are some fairly simple relationships between inputs for which collisions are not minimised. All the hashes come out equal modulo the common factor, which means they'll all fall into the 1/n th of the buckets which have that value modulo the common factor. You get n times as many collisions, where n is the common factor. Since n is at least 2, I'd say it's unacceptable for a fairly simple use case to generate at least twice as many collisions as normal. If some user is going to break our distribution into buckets, we want it to be a freak accident, not some simple predictable usage.
Now, hashtable implementations obviously have no control over the items put into them. They can't prevent them being related. So the thing to do is to ensure that the constant and the bucket counts are coprime. That way you aren't relying on the "last" component alone to determine the modulus of the bucket with respect to some small common factor. As far as I know they don't have to be prime to achieve this, just coprime.
But if the hash function and the hashtable are written independently, then the hashtable doesn't know how the hash function works. It might be using a constant with small factors. If you're lucky it might work completely differently and be nonlinear. If the hash is good enough, then any bucket count is just fine. But a paranoid hashtable can't assume a good hash function, so should use a prime number of buckets. Similarly a paranoid hash function should use a largeish prime constant, to reduce the chance that someone uses a number of buckets which happens to have a common factor with the constant.
In practice, I think it's fairly normal to use a power of 2 as the number of buckets. This is convenient and saves having to search around or pre-select a prime number of the right magnitude. So you rely on the hash function not to use even multipliers, which is generally a safe assumption. But you can still get occasional bad hashing behaviours based on hash functions like the one above, and prime bucket count could help further.
Putting about the principle that "everything has to be prime" is as far as I know a sufficient but not a necessary condition for good distribution over hashtables. It allows everybody to interoperate without needing to assume that the others have followed the same rule.
[Edit: there's another, more specialized reason to use a prime number of buckets, which is if you handle collisions with linear probing. Then you calculate a stride from the hashcode, and if that stride comes out to be a factor of the bucket count then you can only do (bucket_count / stride) probes before you're back where you started. The case you most want to avoid is stride = 0, of course, which must be special-cased, but to avoid also special-casing bucket_count / stride equal to a small integer, you can just make the bucket_count prime and not care what the stride is provided it isn't 0.]
Best Solution
Everybody who is anti-
goto
cites, directly or indirectly, Edsger Dijkstra's GoTo Considered Harmful article to substantiate their position. Too bad Dijkstra's article has virtually nothing to do with the waygoto
statements are used these days and thus what the article says has little to no applicability to the modern programming scene. Thegoto
-less meme verges now on a religion, right down to its scriptures dictated from on high, its high priests and the shunning (or worse) of perceived heretics.Let's put Dijkstra's paper into context to shed a little light on the subject.
When Dijkstra wrote his paper the popular languages of the time were unstructured procedural ones like BASIC, FORTRAN (the earlier dialects) and various assembly languages. It was quite common for people using the higher-level languages to jump all over their code base in twisted, contorted threads of execution that gave rise to the term "spaghetti code". You can see this by hopping on over to the classic Trek game written by Mike Mayfield and trying to figure out how things work. Take a few moments to look that over.
THIS is "the unbridled use of the go to statement" that Dijkstra was railing against in his paper in 1968. THIS is the environment he lived in that led him to write that paper. The ability to jump anywhere you like in your code at any point you liked was what he was criticising and demanding be stopped. Comparing that to the anaemic powers of
goto
in C or other such more modern languages is simply risible.I can already hear the raised chants of the cultists as they face the heretic. "But," they will chant, "you can make code very difficult to read with
goto
in C." Oh yeah? You can make code very difficult to read withoutgoto
as well. Like this one:Not a
goto
in sight, so it must be easy to read, right? Or how about this one:No
goto
there either. It must therefore be readable.What's my point with these examples? It's not language features that make unreadable, unmaintainable code. It's not syntax that does it. It's bad programmers that cause this. And bad programmers, as you can see in that above item, can make any language feature unreadable and unusable. Like the
for
loops up there. (You can see them, right?)Now to be fair, some language constructs are easier to abuse than others. If you're a C programmer, however, I'd peer far more closely at about 50% of the uses of
#define
long before I'd go on a crusade againstgoto
!So, for those who've bothered to read this far, there are several key points to note.
goto
statements was written for a programming environment wheregoto
was a lot more potentially damaging than it is in most modern languages that aren't an assembler.goto
because of this is about as rational as saying "I tried to have fun once but didn't like it so now I'm against it".goto
statements in code that cannot be adequately replaced by other constructs.godo
" abomination where an always-falsedo
loop is broken out of usingbreak
in place of agoto
. These are often worse than judicious use ofgoto
.