R – Probabilistic hashing — is there such a thing


Say you want to implement a click tracker where you want to only count a click to a link from any IP address once, but the number of links and clients is very large and you don't want to keep a table of every single IP-click. Say that you might need this as part of something that runs live against every click and don't want to do a lookup against a big table for every click.

Is there such a thing as "probabilistic hashing" or "lossy hashing" to see if an IP is probably in a set but you don't care if there is a certain error rate as you want to save resources?

Best Solution

You could probably (ab?)use a bloom filter for something like this.

Related Question