How to write a hash function in C

chashhash-functionhashtable

Hash Tables are said to be the fastest/best way of Storing/Retrieving data.

My understanding of a hash table, hashing is as follows (Please correct me if I am wrong or Please add If there is anything more):

  • A Hash Table is nothing but an array (single or multi-dimensional) to store values.
  • Hashing is the process to find the index/location in the array to insert/retrieve the data. You take a data item(s) and pass it as a key(s) to a hash function and you would get the index/location where to insert/retrieve the data.

I have a question:

Is the hash function used to store/retrieve the data DIFFERENT from a
cryptographic hash function used in security applications for authentication
like MD5, HMAC, SHA-1 etc…?

In what way(s) are they different?

  • How to write a hash function in C?
  • Is there some standard or guidelines to it?
  • How do we ensure that the output of a hash function i.e, index is not out of range?

It would be great if you could mention some good links to understand these better.

Best Answer

A cryptographic hash emphasizes making it difficult for anybody to intentionally create a collision. For a hash table, the emphasis is normally on producing a reasonable spread of results quickly. As such, the two are usually quite different (in particular, a cryptographic hash is normally a lot slower).

For a typical hash function, the result is limited only by the type -- e.g. if it returns a size_t, it's perfectly fine for it to return any possible size_t. It's up to you to reduce that output range to the size of your table (e.g. using the remainder of dividing by the size of your table, which should often be a prime number).

As an example, a fairly typical normal hash function might look something like:

// warning: untested code.
size_t hash(char const *input) { 

    const int ret_size = 32;
    size_t ret = 0x555555;
    const int per_char = 7;

    while (*input) { 
        ret ^= *input++;
        ret = ((ret << per_char) | (ret >> (ret_size - per_char));
   }
   return ret;
}

The basic idea here is to have every bit of the input string affect the result, and to (as quickly as possible) have every bit of the result affected by at least part of the input. Note that I'm not particularly recommending this as a great hash function -- only trying to illustrate some of the basics of what you're trying to accomplish.