I got a string of an arbitrary length (lets say 5 to 2000 characters) which I would like to calculate a checksum for.
- The same checksum must be returned each time a calculation is done for a string
- The checksum must be unique (no collisions)
- I can not store previous IDs to check for collisions
Which algorithm should I use?
- Are there an approach which is reasonable unique? i.e. the likelihood of a collision is very small.
- The checksum should be alphanumeric
- The strings are unicode
- The strings are actually texts that should be translated and the checksum is stored with each translation (so a translated text can be matched back to the original text).
- The length of the checksum is not important for me (the shorter, the better)
Let's say that I got the following string
"Welcome to this website. Navigate using the flashy but useless menu above".
The string is used in a view in a similar way to
gettext in linux. i.e. the user just writes (in a razor view)
@T("Welcome to this website. Navigate using the flashy but useless menu above")
Now I need a way to identity that string so that I can fetch it from a data source (there are several implementations of the data source). Having to use the entire string as a key seems a bit inefficient and I'm therefore looking for a way to generate a key out of it.