Outfox The Fox

Sometimes I encounter code like this:

  u >>= 2; // Fastest way to divide by 4.

I usually burst out singing: Those who try to tangle with my daring do, end up at the angle that herring do. (Danny Kaye) [Outfox the Fox] Never try to outfox the compiler. Take a look at what gcc generates to divide an integer by 13 in x64 and PowerPC assembler (approximately):

x64Idiv PowerIdiv

It is not uncommon for a divide instruction to take more than four times as long as a multiplication. Shift and register loading typically are up to fifty times faster. Above (speed) optimization is a simplified (i.e., unsigned) example. It does not take into account that floating point divides can be executed in parallel with integer code, it does not take into account the instructions in the CPU’s pipeline, it does not take into account surrounding code, it does not take into account MMX/SSE instructions, etc. For the curious, there is a excellent blog post on the why and how of multiply/shift-back. Remember, the rule is: you will never outfox the GNU-fox or any other optimizing fox. But if you have to outfox the fox look for inspiration first.