R – Notifying consumer when producer is done


I'm reading in a lot of data from ldap which needs to be compared to the respective records in the database. To minimize the number of SQL queries, I want to batch multiple ldap records into a single query.

All this is pretty simple: A thread to produce ldap results, and a thread to consume those results and run the SQL query.

ldap_results = Queue.Queue(10)
def producer():
  for result in ldap_results():

def consumer():
  buffer = []
  buffer_size = 5
  while True:
    record = ldap_results.get()
    if len(buffer) >= buffer_size:
      buffer = []

The problem is: If ldap only returns, say, 3 results and buffer_size is 5, it'll end up blocking forever. I realize I could put some special token into the buffer, like None, or "EOF", but that seems like bad design: "iterate until you're done, oh, unless you see this special value, that means you're done, too".

I came up with two alternative ideas. The first is to have a shared eof variable, but I don't know how to properly synchronize it.

def producer():
  while data:
  eof = True

def consumer():
  while not eof:

The second is to have a ProduceChunks(chunk_size) method for the producer, and it'll handle the batching up of results, but I don't like that because it assumes the producer will know how best to buffer up results, when, really, I think that is the responsibility of the consumer.

Does anyone have any guidance?

Best Solution

I would follow the "Make it Run, Make it Right, Make it Fast, Make it Simple" pattern.

Can you implement this correctly without an special "EOF" token? If not, then you just have to use the EOF token, do not sweat it. Yes, the termination condition is more complex, but now it is "Right."

Related Question