The problem was that the register allocator ran too many loops over a function while trying to allocate registers for the temporaries. In theory it should never need to loop over it more that three times, but in reality it may need to do it more times. I have set a limit of 20, that should be sufficient.
Consider a parse tree like this:
ASSIGN / \ / \3 MEM PLUS / \ 1/ \2 TREE TREE
Three temporary registers, 1, 2 and 3 are allocated to evaluate this tree.
Fix 1) Assume that PLUS generates a 2-op instruction (add to left) and after Sethi-Ullman the left subtree would be evaluated first, then an implicit move 1 -> 3 would be generated before the right subtree is evaluated, causing all interference edges to be against 3. Then if the register allocator finds out that 3 must be spilled, the instruction generator would only find out that a new "3" were needed and hence no improvement. The fix is to spill subtree 1 instead in this case.
Fix 2) If the subtree below 1 is only a memory reference, the spill selection code might spill 1 which has a very short live-range and has to be replaced by a similar instruction, given no improvement of colorability. The fix was to avoid these short-range instructions if possible.
Both these scenarios only occurred if there were very few registers available, in the i386 case when dealing with long longs. Because there are only 6 registers and I treat them as 3 concatenated registers, if there is also a function call involved (claiming three regs) only 1 1/2 register is left. That's why newfs/dumplfs failed to compile.
The COLORMAP is one of the more central parts of the register allocator. Its purpose is to tell whether a temporary is trivially colorable or not.
A "traditional" graph-coloring register allocator can only handle one register class. Its colorability is checked by "degree < K", which means that it should have fewer than K registers live at the same time. K is the total number of registers available.
When I expanded the allocator to handle multiple register classes, one of the things were to check the colorability by finding out the "worst-case" affecting the temporary, when taking in account all neighbors classes.
Here is some examples of the x86 implementation:
6 16/32-bit registers (CLASSA) 8 8-bit regs (CLASSB) 8 floating regs and (CLASSD) 15 64-bit regs (!). (CLASSC) Those are handled as pairs of 32-bit regs.
So, for CLASSA the check looks like this:
case CLASSA: num = r[CLASSB] > 4 ? 4 : r[CLASSB]; num += 2*r[CLASSC]; num += r[CLASSA]; return num < 6;
It means that there can be max 5 CLASSA neighbors, 2 CLASSC neighbors or an infinite number of CLASSB regs if all neighbors are of CLASSB. Then they are just added to see the total number.
The CLASSB stuff is because even if all 8-bit regs are used edi/esi will still be available.
So, if we are guaranteed to have a free register, return true, otherwise false.
The colormap could be an auto-generated array of size NCLASS, but I haven't written the code to generate it yet. The information is available in the ROVERLAP macro in macdefs, which tells which registers overlap with each other.