C++ – Circular lock-free buffer


I'm in the process of designing a system which connects to one or more stream of data feeds and do some analysis on the data than trigger events based on the result. In a typical multi-threaded producer/consumer setup, I will have multiple producer threads putting data into a queue, and multiple consumer threads reading the data, and the consumers are only interested in the latest data point plus n number of points. The producer threads will have to block if slow consumer can not keep up, and of course consumer threads will block when there are no unprocessed updates. Using a typical concurrent queue with reader/writer lock will work nicely but the rate of data coming in could be huge, so i wanted to reduce my locking overhead especially writer locks for the producers. I think a circular lock-free buffer is what I needed.

Now two questions:

  1. Is circular lock-free buffer the answer?

  2. If so, before i roll my own, do you know any public implementation that will fit my need?

Any pointers in implementing a circular lock-free buffer are always welcome.

BTW, doing this in C++ on Linux.

Some additional info:

The response time is critical for my system. Ideally the consumer threads will want to see any updates coming in as soon as possible because an extra 1 millisecond delay could make the system worthless, or worth a lot less.

The design idea I'm leaning toward is a semi-lock-free circular buffer where the producer thread put data in the buffer as fast as it can, let's call the head of the buffer A, without blocking unless the buffer is full, when A meets the end of buffer Z. Consumer threads will each hold two pointers to the circular buffer, P and Pn, where P is the thread's local buffer head, and Pn is nth item after P. Each consumer thread will advance its P and Pn once it finish processing current P and the end of buffer pointer Z is advanced with the slowest Pn. When P catch up to A, which means no more new update to process, the consumer spins and do busy wait for A to advance again. If consumer thread spin for too long, it can be put to sleep and wait for a condition variable, but i'm okay with consumer taking up CPU cycle waiting for update because that does not increase my latency (I'll have more CPU cores than threads). Imagine you have a circular track, and the producer is running in front of a bunch of consumers, the key is to tune the system so that the producer is usually runing just a few step ahead of the consumers, and most of these operation can be done using lock-free techniques. I understand getting the details of the implementation right is not easy…okay, very hard, that's why I want to learn from others' mistakes before making a few of my own.

Best Solution

I've made a particular study of lock-free data structures in the last couple of years. I've read most of the papers in the field (there's only about fourty or so - although only about ten or fifteen are any real use :-)

AFAIK, a lock-free circular buffer has not been invented. The problem will be dealing with the complex condition where a reader overtakes a writer or vis-versa.

If you have not spent at least six months studying lock-free data structures, do not attempt to write one yourself. You will get it wrong and it may not be obvious to you that errors exist, until your code fails, after deployment, on new platforms.

I believe however there is a solution to your requirement.

You should pair a lock-free queue with a lock-free free-list.

The free-list will give you pre-allocation and so obviate the (fiscally expensive) requirement for a lock-free allocator; when the free-list is empty, you replicate the behaviour of a circular buffer by instantly dequeuing an element from the queue and using that instead.

(Of course, in a lock-based circular buffer, once the lock is obtained, obtaining an element is very quick - basically just a pointer dereference - but you won't get that in any lock-free algorithm; they often have to go well out of their way to do things; the overhead of failing a free-list pop followed by a dequeue is on a par with the amount of work any lock-free algorithm will need to be doing).

Michael and Scott developed a really good lock-free queue back in 1996. A link below will give you enough details to track down the PDF of their paper; Michael and Scott, FIFO

A lock-free free-list is the simplest lock-free algorithm and in fact I don't think I've seen an actual paper for it.

Related Question