I'll do my best to explain it here on simple terms, but be warned that this topic takes my students a couple of months to finally grasp. You can find more information on the Chapter 2 of the Data Structures and Algorithms in Java book.

There is no mechanical procedure that can be used to get the BigOh.

As a "cookbook", to obtain the BigOh from a piece of code you first need to realize that you are creating a math formula to count how many steps of computations get executed given an input of some size.

The purpose is simple: to compare algorithms from a theoretical point of view, without the need to execute the code. The lesser the number of steps, the faster the algorithm.

For example, let's say you have this piece of code:

```
int sum(int* data, int N) {
int result = 0; // 1
for (int i = 0; i < N; i++) { // 2
result += data[i]; // 3
}
return result; // 4
}
```

This function returns the sum of all the elements of the array, and we want to create a formula to count the computational complexity of that function:

```
Number_Of_Steps = f(N)
```

So we have `f(N)`

, a function to count the number of computational steps. The input of the function is the size of the structure to process. It means that this function is called such as:

```
Number_Of_Steps = f(data.length)
```

The parameter `N`

takes the `data.length`

value. Now we need the actual definition of the function `f()`

. This is done from the source code, in which each interesting line is numbered from 1 to 4.

There are many ways to calculate the BigOh. From this point forward we are going to assume that every sentence that doesn't depend on the size of the input data takes a constant `C`

number computational steps.

We are going to add the individual number of steps of the function, and neither the local variable declaration nor the return statement depends on the size of the `data`

array.

That means that lines 1 and 4 takes C amount of steps each, and the function is somewhat like this:

```
f(N) = C + ??? + C
```

The next part is to define the value of the `for`

statement. Remember that we are counting the number of computational steps, meaning that the body of the `for`

statement gets executed `N`

times. That's the same as adding `C`

, `N`

times:

```
f(N) = C + (C + C + ... + C) + C = C + N * C + C
```

There is no mechanical rule to count how many times the body of the `for`

gets executed, you need to count it by looking at what does the code do. To simplify the calculations, we are ignoring the variable initialization, condition and increment parts of the `for`

statement.

To get the actual BigOh we need the Asymptotic analysis of the function. This is roughly done like this:

- Take away all the constants
`C`

.
- From
`f()`

get the polynomium in its `standard form`

.
- Divide the terms of the polynomium and sort them by the rate of growth.
- Keep the one that grows bigger when
`N`

approaches `infinity`

.

Our `f()`

has two terms:

```
f(N) = 2 * C * N ^ 0 + 1 * C * N ^ 1
```

Taking away all the `C`

constants and redundant parts:

```
f(N) = 1 + N ^ 1
```

Since the last term is the one which grows bigger when `f()`

approaches infinity (think on limits) this is the BigOh argument, and the `sum()`

function has a BigOh of:

```
O(N)
```

There are a few tricks to solve some tricky ones: use summations whenever you can.

As an example, this code can be easily solved using summations:

```
for (i = 0; i < 2*n; i += 2) { // 1
for (j=n; j > i; j--) { // 2
foo(); // 3
}
}
```

The first thing you needed to be asked is the order of execution of `foo()`

. While the usual is to be `O(1)`

, you need to ask your professors about it. `O(1)`

means (almost, mostly) constant `C`

, independent of the size `N`

.

The `for`

statement on the sentence number one is tricky. While the index ends at `2 * N`

, the increment is done by two. That means that the first `for`

gets executed only `N`

steps, and we need to divide the count by two.

```
f(N) = Summation(i from 1 to 2 * N / 2)( ... ) =
= Summation(i from 1 to N)( ... )
```

The sentence number *two* is even trickier since it depends on the value of `i`

. Take a look: the index i takes the values: 0, 2, 4, 6, 8, ..., 2 * N, and the second `for`

get executed: N times the first one, N - 2 the second, N - 4 the third... up to the N / 2 stage, on which the second `for`

never gets executed.

On formula, that means:

```
f(N) = Summation(i from 1 to N)( Summation(j = ???)( ) )
```

Again, we are counting **the number of steps**. And by definition, every summation should always start at one, and end at a number bigger-or-equal than one.

```
f(N) = Summation(i from 1 to N)( Summation(j = 1 to (N - (i - 1) * 2)( C ) )
```

(We are assuming that `foo()`

is `O(1)`

and takes `C`

steps.)

We have a problem here: when `i`

takes the value `N / 2 + 1`

upwards, the inner Summation ends at a negative number! That's impossible and wrong. We need to split the summation in two, being the pivotal point the moment `i`

takes `N / 2 + 1`

.

```
f(N) = Summation(i from 1 to N / 2)( Summation(j = 1 to (N - (i - 1) * 2)) * ( C ) ) + Summation(i from 1 to N / 2) * ( C )
```

Since the pivotal moment `i > N / 2`

, the inner `for`

won't get executed, and we are assuming a constant C execution complexity on its body.

Now the summations can be simplified using some identity rules:

- Summation(w from 1 to N)( C ) = N * C
- Summation(w from 1 to N)( A (+/-) B ) = Summation(w from 1 to N)( A ) (+/-) Summation(w from 1 to N)( B )
- Summation(w from 1 to N)( w * C ) = C * Summation(w from 1 to N)( w ) (C is a constant, independent of
`w`

)
- Summation(w from 1 to N)( w ) = (N * (N + 1)) / 2

Applying some algebra:

```
f(N) = Summation(i from 1 to N / 2)( (N - (i - 1) * 2) * ( C ) ) + (N / 2)( C )
f(N) = C * Summation(i from 1 to N / 2)( (N - (i - 1) * 2)) + (N / 2)( C )
f(N) = C * (Summation(i from 1 to N / 2)( N ) - Summation(i from 1 to N / 2)( (i - 1) * 2)) + (N / 2)( C )
f(N) = C * (( N ^ 2 / 2 ) - 2 * Summation(i from 1 to N / 2)( i - 1 )) + (N / 2)( C )
=> Summation(i from 1 to N / 2)( i - 1 ) = Summation(i from 1 to N / 2 - 1)( i )
f(N) = C * (( N ^ 2 / 2 ) - 2 * Summation(i from 1 to N / 2 - 1)( i )) + (N / 2)( C )
f(N) = C * (( N ^ 2 / 2 ) - 2 * ( (N / 2 - 1) * (N / 2 - 1 + 1) / 2) ) + (N / 2)( C )
=> (N / 2 - 1) * (N / 2 - 1 + 1) / 2 =
(N / 2 - 1) * (N / 2) / 2 =
((N ^ 2 / 4) - (N / 2)) / 2 =
(N ^ 2 / 8) - (N / 4)
f(N) = C * (( N ^ 2 / 2 ) - 2 * ( (N ^ 2 / 8) - (N / 4) )) + (N / 2)( C )
f(N) = C * (( N ^ 2 / 2 ) - ( (N ^ 2 / 4) - (N / 2) )) + (N / 2)( C )
f(N) = C * (( N ^ 2 / 2 ) - (N ^ 2 / 4) + (N / 2)) + (N / 2)( C )
f(N) = C * ( N ^ 2 / 4 ) + C * (N / 2) + C * (N / 2)
f(N) = C * ( N ^ 2 / 4 ) + 2 * C * (N / 2)
f(N) = C * ( N ^ 2 / 4 ) + C * N
f(N) = C * 1/4 * N ^ 2 + C * N
```

And the BigOh is:

```
O(N²)
```

I usually go with something like the implementation given in Josh Bloch's *fabulous* Effective Java. It's fast and creates a pretty good hash which is unlikely to cause collisions. Pick two different prime numbers, e.g. 17 and 23, and do:

```
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
// Suitable nullity checks etc, of course :)
hash = hash * 23 + field1.GetHashCode();
hash = hash * 23 + field2.GetHashCode();
hash = hash * 23 + field3.GetHashCode();
return hash;
}
}
```

As noted in comments, you may find it's better to pick a large prime to multiply by instead. Apparently 486187739 is good... and although most examples I've seen with small numbers tend to use primes, there are at least similar algorithms where non-prime numbers are often used. In the not-quite-FNV example later, for example, I've used numbers which apparently work well - but the initial value isn't a prime. (The multiplication constant *is* prime though. I don't know quite how important that is.)

This is better than the common practice of `XOR`

ing hashcodes for two main reasons. Suppose we have a type with two `int`

fields:

```
XorHash(x, x) == XorHash(y, y) == 0 for all x, y
XorHash(x, y) == XorHash(y, x) for all x, y
```

By the way, the earlier algorithm is the one currently used by the C# compiler for anonymous types.

This page gives quite a few options. I think for most cases the above is "good enough" and it's incredibly easy to remember and get right. The FNV alternative is similarly simple, but uses different constants and `XOR`

instead of `ADD`

as a combining operation. It looks *something* like the code below, but the normal FNV algorithm operates on individual bytes, so this would require modifying to perform one iteration per byte, instead of per 32-bit hash value. FNV is also designed for variable lengths of data, whereas the way we're using it here is always for the same number of field values. Comments on this answer suggest that the code here doesn't actually work as well (in the sample case tested) as the addition approach above.

```
// Note: Not quite FNV!
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = (int) 2166136261;
// Suitable nullity checks etc, of course :)
hash = (hash * 16777619) ^ field1.GetHashCode();
hash = (hash * 16777619) ^ field2.GetHashCode();
hash = (hash * 16777619) ^ field3.GetHashCode();
return hash;
}
}
```

Note that one thing to be aware of is that ideally you should prevent your equality-sensitive (and thus hashcode-sensitive) state from changing after adding it to a collection that depends on the hash code.

As per the documentation:

You can override GetHashCode for immutable reference types. In general, for mutable reference types, you should override GetHashCode only if:

- You can compute the hash code from fields that are not mutable; or
- You can ensure that the hash code of a mutable object does not change while the object is contained in a collection that relies on its hash code.

The link to the FNV article is broken but here is a copy in the Internet Archive: Eternally Confuzzled - The Art of Hashing

## Best Solution

Here's the explanation directly from the source ( almost )

Search 101!at min 22:03

Worth watching!

Basically and according to Douglas Merrill former CTO of Google it is like this:

1) You write a ( misspelled ) word in google

2) You don't find what you wanted ( don't click on any results )

3) You realize you misspelled the word so you rewrite the word in the search box.

4) You find what you want ( you click in the first links )

This pattern multiplied millions of times, shows what are the most common misspells and what are the most "common" corrections.

This way Google can almost instantaneously, offer spell correction in every language.

Also this means if overnight everyone start to spell night as "nigth" google would suggest that word instead.

EDIT@ThomasRutter: Douglas describe it as "statistical machine learning".

They know who correct the query, because they know which query comes from which user ( using cookies )

If the users perform a query, and only 10% of the users click on a result and 90% goes back and type another query ( with the corrected word ) and this time that 90% clicks on a result, then they know they have found a correction.

They can also know if those are "related" queries of two different, because they have information of all the links they show.

Furthermore, they are now including the context into the spell check, so they can even suggest different word depending on the context.

See this demo of google wave ( @ 44m 06s ) that shows how the context is taken into account to automatically correct the spelling.

Here it is explained how that natural language processing works.

And finally here is an awesome demo of what can be done adding automatic machine translation ( @ 1h 12m 47s ) to the mix.

_{ I've added anchors of minute and seconds to the videos to skip directly to the content, if they don't work, try reloading the page or scrolling by hand to the mark. }