Compiler optimizations for constant divisors

Optimizing for speed

Today's compilers do a decent job in optimizing high level code to gain speed in execution time. As compilers are getting more and more complex over time, so does their emitted code. This tutorial will focus on a specific arithmetic optimization done by an optimizing compiler to avoid costly instructions such as div resp. idiv by transforming the calculations to fixed point arithmetic.
Actually, I just wanted to write some random stuff to test the new DruTeX plugin ;-)

To give you an idea of how such an optimized block of code looks like, here's a block of code I came across some time ago while analyzing some binary with IDA:

mov ebx, some_value
mov eax, 0CCCCCCCDh
mul ebx
mov eax, edx
shr eax, 3

where some_value represents an arbitrary integer value. At first sight the purpose of this code doesn't seem to be that obvious, does it?. Before you think about it, recap how the mul instruction works: mul ebx computes EAX*EBX and stores the 64-bit result in EDX:EAX, i.e. higher 32 bit in EDX and lower 32 bit in EAX.

Magic constants and fixed point arithmetic

First of all, let me clarify that the code above just represents an optimized version of a regular (integer) division. We will see in a few seconds how to make sense of this code.

So what is fixed point arithmetic anyway?
Fixed point arithmetic allocates a fixed set of bits to the integral as well as the fractional part of a rational number.
In this situation the compiler exploits the fact, that the x86 CPU is capable of handling 64 bit numbers by the register pair EDX:EAX, that is we reserve 32 bits for the integral and another 32 bits for the fractional part of our rational number.
So if we want to represent a fractional number in terms of CPU registers, we could say: EDX is our integral part whereas EAX is our fractional part. And that's exactly whats going on here.
This means that the high 32bit (EDX) represent a number between 0 and $ 2^{32} $-1, and EAX represents a number between 0 and 1.
Let's say you want to represent the number $ \frac{1}{3} = 0,333333333 $.
The registers would look like this:
EDX=0 and EAX=55555555h. Now let's see how we get to that magic constant.

We know, that we have 32 bits to represent a value between 0 and 1. The situation where all bits are set would represent a one, so we scale the fractional part by $ 2^{32} $, which means that if we compute $ \frac{1}{3}*2^{32} $, we get our magic constant: 55555555h.
The important fact is, that $ 2^{32} $ represents a one, so in order to represent $ \frac{1}{3} $ we have to divide $ 2^{32} $ by 3.
If we now calculate X*55555555h we get the result in EDX:EAX! And now you see the magic:
EDX contains the integral part of the formula: $ \frac{x}{3} $. We just divided x by 3 without using the div instruction!
Notice that this code is generated by an optimizing compiler because the div instruction is usually much slower than the mul instruction. Of course this can only be done if the compiler knows the divisor at compile time because the magic constant and the surrounding code is generated from it.

Example revisited

Now let's look at the code above again:
It's easy to see that CCCCCCCDh represents $ \frac{4}{5}=0.8 $.
Let some_value be denoted by x for now: we compute x*CCCCCCCD which corresponds to $ x*\frac{4}{5} $ and store the result in EDX:EAX. Then the code grabs the integral part from EDX, puts it in EAX and performs a right shift by 3, which is equivalent to dividing by 8. So the complete calculation is:
$ x*\frac{4}{5}*\frac{1}{8} = \frac{x}{10} $.
So in the end we just divided by 10, but with ridiculous - err ludicrous speed!
Before we reach the end of this little tutorial, let me give you one last example:

mov ebx, some_value
mov eax, 24924925h
mul ebx
mov eax, ebx
sub eax, edx
shr eax, 1
add eax, edx
shr eax, 2

Let some_value be denoted by x again:
Frist we have to find out what the magic constant is, so we calculate:
$ \frac{613566757}{2^{32}} = \frac{1}{7} $. By translating the above asm statements it's easy to see that we get to the following equations:

$ \frac{\frac{x-\frac{x}{7}}{2}+\frac{x}{7}}{4} = \frac{\frac{\frac{6x}{7}}{2}+\frac{x}{7}}{4} = \frac{\frac{4x}{7}}{4} = \frac{x}{7} $.
The additional operations are emitted by the compiler to minimize rounding errors. This becomes significant for very large numbers. If you want to understand what i mean, have a look at division by 5 code produced by an optimizing compiler.

Common constants

Here's a table with some randomly chosen constants and the corresponding fractions:

Magic constant Resulting fraction
55555555h $ \frac{1}{3} $
66666667h $ \frac{2}{5} $
CCCCCCCDh $ \frac{4}{5} $
24924925h $ \frac{1}{7} $
38E38E39h $ \frac{2}{9} $

The complexity of the assembly code varies depending on the divisor, e.g. the code for $ \frac{1}{7} $ is more complex than the one for $ \frac{1}{3} $.

Conclusion

As you saw an optimizing compiler can make a reverse engineers life a little bit harder, but also more interesting if you know what's going on under the hood.