Okay, so you want to calculate `a^b mod m`

. First we'll take a naive approach and then see how we can refine it.

First, reduce `a mod m`

. That means, find a number `a1`

so that `0 <= a1 < m`

and `a = a1 mod m`

. Then repeatedly in a loop multiply by `a1`

and reduce again `mod m`

. Thus, in pseudocode:

```
a1 = a reduced mod m
p = 1
for(int i = 1; i <= b; i++) {
p *= a1
p = p reduced mod m
}
```

By doing this, we avoid numbers larger than `m^2`

. This is the key. The reason we avoid numbers larger than `m^2`

is because at every step `0 <= p < m`

and `0 <= a1 < m`

.

As an example, let's compute `5^55 mod 221`

. First, `5`

is already reduced `mod 221`

.

`1 * 5 = 5 mod 221`

`5 * 5 = 25 mod 221`

`25 * 5 = 125 mod 221`

`125 * 5 = 183 mod 221`

`183 * 5 = 31 mod 221`

`31 * 5 = 155 mod 221`

`155 * 5 = 112 mod 221`

`112 * 5 = 118 mod 221`

`118 * 5 = 148 mod 221`

`148 * 5 = 77 mod 221`

`77 * 5 = 164 mod 221`

`164 * 5 = 157 mod 221`

`157 * 5 = 122 mod 221`

`122 * 5 = 168 mod 221`

`168 * 5 = 177 mod 221`

`177 * 5 = 1 mod 221`

`1 * 5 = 5 mod 221`

`5 * 5 = 25 mod 221`

`25 * 5 = 125 mod 221`

`125 * 5 = 183 mod 221`

`183 * 5 = 31 mod 221`

`31 * 5 = 155 mod 221`

`155 * 5 = 112 mod 221`

`112 * 5 = 118 mod 221`

`118 * 5 = 148 mod 221`

`148 * 5 = 77 mod 221`

`77 * 5 = 164 mod 221`

`164 * 5 = 157 mod 221`

`157 * 5 = 122 mod 221`

`122 * 5 = 168 mod 221`

`168 * 5 = 177 mod 221`

`177 * 5 = 1 mod 221`

`1 * 5 = 5 mod 221`

`5 * 5 = 25 mod 221`

`25 * 5 = 125 mod 221`

`125 * 5 = 183 mod 221`

`183 * 5 = 31 mod 221`

`31 * 5 = 155 mod 221`

`155 * 5 = 112 mod 221`

`112 * 5 = 118 mod 221`

`118 * 5 = 148 mod 221`

`148 * 5 = 77 mod 221`

`77 * 5 = 164 mod 221`

`164 * 5 = 157 mod 221`

`157 * 5 = 122 mod 221`

`122 * 5 = 168 mod 221`

`168 * 5 = 177 mod 221`

`177 * 5 = 1 mod 221`

`1 * 5 = 5 mod 221`

`5 * 5 = 25 mod 221`

`25 * 5 = 125 mod 221`

`125 * 5 = 183 mod 221`

`183 * 5 = 31 mod 221`

`31 * 5 = 155 mod 221`

`155 * 5 = 112 mod 221`

Therefore, `5^55 = 112 mod 221`

.

Now, we can improve this by using exponentiation by squaring; this is the famous trick wherein we reduce exponentiation to requiring only `log b`

multiplications instead of `b`

. Note that with the algorithm that I described above, the exponentiation by squaring improvement, you end up with the right-to-left binary method.

```
a1 = a reduced mod m
p = 1
while (b > 0) {
if (b is odd) {
p *= a1
p = p reduced mod m
}
b /= 2
a1 = (a1 * a1) reduced mod m
}
```

Thus, since 55 = 110111 in binary

`1 * (5^1 mod 221) = 5 mod 221`

`5 * (5^2 mod 221) = 125 mod 221`

`125 * (5^4 mod 221) = 112 mod 221`

`112 * (5^16 mod 221) = 112 mod 221`

`112 * (5^32 mod 221) = 112 mod 221`

Therefore the answer is `5^55 = 112 mod 221`

. The reason this works is because

```
55 = 1 + 2 + 4 + 16 + 32
```

so that

```
5^55 = 5^(1 + 2 + 4 + 16 + 32) mod 221
= 5^1 * 5^2 * 5^4 * 5^16 * 5^32 mod 221
= 5 * 25 * 183 * 1 * 1 mod 221
= 22875 mod 221
= 112 mod 221
```

In the step where we calculate `5^1 mod 221`

, `5^2 mod 221`

, etc. we note that `5^(2^k)`

= `5^(2^(k-1)) * 5^(2^(k-1))`

because `2^k = 2^(k-1) + 2^(k-1)`

so that we can first compute `5^1`

and reduce `mod 221`

, then square this and reduce `mod 221`

to obtain `5^2 mod 221`

, etc.

The above algorithm formalizes this idea.

## Best Solution

The modulus/modulo operation is usually understood as the integer equivalent of the remainder operation - a side effect or counterpart to division.

Except for some degenerate cases (where the divisor is a power of the operating base - i.e. a power of 2 for most number formats) this is just as expensive as integer division!

So the question is really, why is integer division so expensive?

I don't have the time or expertise to analyze this mathematically, so I'm going to appeal to grade school maths:

Consider the number of lines of working out in the notebook (not including the inputs) required for:

So in simple terms, this should give you a feel for why division and hence modulo is slower: computers still have to do long division in the same stepwise fashion tha you did in grade school.

If this makes no sense to you; you may have been brought up on school math a little more modern than mine (30+ years ago).

The Order/Big O notation used above as O(something) expresses the complexity of a computation in terms of the size of its inputs, and expresses a fact about its execution time. http://en.m.wikipedia.org/wiki/Big_O_notation

O(1) executes in constant (but possibly large) time. O(N) takes as much time as the size of its data-so if the data is 32 bits it takes 32 times the O(1) time of the step to calculate one of its N steps, and O(N^2) takes N times N (N squared) the time of its N steps (or possibly N times MN for some constant M). Etc.

In the above working I have used O(N) rather than O(N^2) for addition since the 32 or 64 bits of the first input are calculated in parallel by the CPU. In a hypothetical 1 bit machine a 32 bit addition operation would be O(32^2) and change. The same order reduction applies to the other operations too.