R – Choose random array element satisfying certain property

algorithmarraysprobabilitystatistics

Suppose I have a list, called elements, each of which does or does not satisfy some boolean property p. I want to choose one of the elements that satisfies p by random with uniform distribution. I do not know ahead of time how many items satisfy this property p.

Will the following code do this?:

pickRandElement(elements, p)
     randElement = null
     count = 0
     foreach element in elements
          if (p(element))
               count = count + 1
               if (randInt(count) == 0)
                    randElement = element

     return randElement

(randInt(n) returns a random int k with 0 <= k < n.)

Best Solution

It works mathematically. Can be proven by induction.

Clearly works for n = 1 element satisfying p.

For n+1 elements, we will choose the element n+1 with probability 1/(n+1), so its probability is OK. But how does that effect the end probability of choosing one of the prior n elements?

Each of the prior n had a chance of being selected with probability 1/n, until we found element n+1. Now, after finding n+1, there is a 1/(n+1) chance that element n+1 is chosen, so there is a n/(n+1) chance that the previously chosen element remains chosen. Which means its final probability of being the chosen after n+1 finds is 1/n * (n/n+1) = 1/n+1 -- which is the probability we want for all n+1 elements for uniform distribution.

If it works for n = 1, and it works for n+1 given n, then it works for all n.