Back to wiki

RCS History

/ TOP / internals / register_allocator.mdwn

Revision 1.2


The problem was that the register allocator ran too many loops over a function while trying to allocate registers for the temporaries.  In theory it should never need to loop over it more that three times, but in reality it may need to do it more times.  I have set a limit of 20, that should be sufficient.

Consider a parse tree like this:

       ASSIGN
       /   \
      /     \3
    MEM    PLUS
          /  \
        1/    \2
       TREE   TREE


Three temporary registers, 1, 2 and 3 are allocated to evaluate this tree.

Fix 1)  Assume that PLUS generates a 2-op instruction (add to left) and after Sethi-Ullman the left subtree would be evaluated first, then an implicit move 1 -> 3 would be generated before the right subtree is evaluated, causing all interference edges to be against 3.  Then if the register allocator finds
out that 3 must be spilled, the instruction generator would only find out that a new "3" were needed and hence no improvement.  The fix is to spill subtree 1 instead in this case.

Fix 2)  If the subtree below 1 is only a memory reference, the spill selection code might spill 1 which has a very short live-range and has to be replaced by a similar instruction, given no improvement of colorability.  The fix was to avoid these short-range instructions if possible.

Both these scenarios only occurred if there were very few registers available, in the i386 case when dealing with long longs.  Because there are only 6 registers and I treat them as 3 concatenated registers, if there is also a function call involved (claiming three regs) only 1 1/2 register is left.  That's why newfs/dumplfs failed to compile.

# COLORMAP

The COLORMAP is one of the more central parts of the register allocator.
Its purpose is to tell whether a temporary is trivially colorable or not.

A "traditional" graph-coloring register allocator can only handle one
register class.  Its colorability is checked by "degree < K", which means that it
should have fewer than K registers live at the same time.  K is the
total number of registers available.

When I expanded the allocator to handle multiple register classes, one of
the things were to check the colorability by finding out the "worst-case"
affecting the temporary, when taking in account all neighbors classes.

Here is some examples of the x86 implementation:

There are:

       6 16/32-bit registers   (CLASSA)
       8 8-bit regs            (CLASSB)
       8 floating regs and     (CLASSD)
       15 64-bit regs (!).     (CLASSC)
               Those are handled as pairs of 32-bit regs.

So, for CLASSA the check looks like this:

       case CLASSA:
               num = r[CLASSB] > 4 ? 4 : r[CLASSB];
               num += 2*r[CLASSC];
               num += r[CLASSA];
               return num < 6;

It means that there can be max 5 CLASSA neighbors, 2 CLASSC neighbors or
an infinite number of CLASSB regs if all neighbors are of CLASSB.
Then they are just added to see the total number.

The CLASSB stuff is because even if all 8-bit regs are used edi/esi will
still be available.

So, if we are guaranteed to have a free register, return true, otherwise
false.

The colormap could be an auto-generated array of size NCLASS, but I
haven't written the code to generate it yet.  The information is
available in the ROVERLAP macro in macdefs, which tells which registers
overlap with each other.

Powered by rcshistory.cgi 0.3