C# – Calculate a checksum for a string

c++

I got a string of an arbitrary length (lets say 5 to 2000 characters) which I would like to calculate a checksum for.

Requirements

  • The same checksum must be returned each time a calculation is done for a string
  • The checksum must be unique (no collisions)
  • I can not store previous IDs to check for collisions

Which algorithm should I use?

Update:

  • Are there an approach which is reasonable unique? i.e. the likelihood of a collision is very small.
  • The checksum should be alphanumeric
  • The strings are unicode
  • The strings are actually texts that should be translated and the checksum is stored with each translation (so a translated text can be matched back to the original text).
  • The length of the checksum is not important for me (the shorter, the better)

Update2

Let's say that I got the following string "Welcome to this website. Navigate using the flashy but useless menu above".

The string is used in a view in a similar way to gettext in linux. i.e. the user just writes (in a razor view)

@T("Welcome to this website. Navigate using the flashy but useless menu above")

Now I need a way to identity that string so that I can fetch it from a data source (there are several implementations of the data source). Having to use the entire string as a key seems a bit inefficient and I'm therefore looking for a way to generate a key out of it.

Best Solution

That's not possible.

If you can't store previous values, it's not possible to create a unique checksum that is smaller than the information in the string.

Update:

The term "reasonably unique" doesn't make sense, either it's unique or it's not.

To get a reasonably low risk of hash collisions, you can use a resonably large hash code.

The MD5 algorithm for example produces a 16 byte hash code. Convert the string to a byte array using some encoding that preserves all characters, for example UTF-8, calculate the hash code using the MD5 class, then convert the hash code byte array into a string using the BitConverter class:

string theString = "asdf";

string hash;
using (System.Security.Cryptography.MD5 md5 = System.Security.Cryptography.MD5.Create()) {
  hash = BitConverter.ToString(
    md5.ComputeHash(Encoding.UTF8.GetBytes(theString))
  ).Replace("-", String.Empty);
}

Console.WriteLine(hash);

Output:

912EC803B2CE49E4A541068D495AB570