R – Fast Euclidean division in C

bit-manipulationc++micro-optimization

I am interested in getting the remainder of the Euclidean division, that is, for a pair of integers (i, n), find r such as:

i = k * n + r, 0 <= r < |k|

the simple solution is:

int euc(int i, int n)
{
    int r;

    r = i % n;
    if ( r < 0) {
        r += n;
    }
    return r;
}

But since I need to execute this tens of million of times (it is used inside an iterator for multidimensional arrays), I would like to avoid the branching if possible. Requirements:

  • Branching but faster is also desirable.
  • A solution which works only for positive n is acceptable (but it has to work for negative i).
  • n is not known in advance, and can be any value > 0 and < MAX_INT

Edit

It is actually quite easy to get the result wrong, so here is an example of the expected results:

  • euc(0, 3) = 0
  • euc(1, 3) = 1
  • euc(2, 3) = 2
  • euc(3, 3) = 0
  • euc(-1, 3) = 2
  • euc(-2, 3) = 1
  • euc(-3, 3) = 0

Some people also worry that it does not make sense to optimize this. I need this for an multi-dimensional iterator where out of bounds items are replaced by items in a 'virtual array' which repeats the original array. So if my array x is [1, 2, 3, 4], the virtual array is […., 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4], and for example, x[-2] is x1, etc…

For a nd array of dimension d, I need d Euclidean division for every point. If I need to do a correlation between a n^d array with a m^d kernel, I need n^d * m^d * d euclidean divisions. For a 3d image of 100x100x100 points and a kernel of 5*5*5 points, that's already ~ 400 million Euclidean divisions.

Best Solution

Edit: No multiplication or branches woot.

int euc(int i, int n)
{
    int r;

    r = i % n;
    r += n & (-(r < 0));

    return r;
}

Here is the generated code. According to the MSVC++ instrumenting profiler (my testing) and the OP's testing, they perform nearly the same.

; Original post
00401000  cdq              
00401001  idiv        eax,ecx 
00401003  mov         eax,edx 
00401005  test        eax,eax 
00401007  jge         euc+0Bh (40100Bh) 
00401009  add         eax,ecx 
0040100B  ret              

; Mine
00401020  cdq              
00401021  idiv        eax,ecx 
00401023  xor         eax,eax 
00401025  test        edx,edx 
00401027  setl        al   
0040102A  neg         eax  
0040102C  and         eax,ecx 
0040102E  add         eax,edx 
00401030  ret