Hi,
Personally, I was using WinAXEpad (on a windows netbook) to employ an "up to date" compiler (but with fast and easy loading) for a "quick and dirty" test. I usually test for/with an 08M2, which does occasionally produce a slightly smaller code size than 14 and 20 pin M2s
Checking again, I see that the real difference in the number of bytes produced is actually the difference betweeen "from 1 to 1500" (typically 12 bytes) and "from 1500 to 1" (14 bytes). Whether the step is -1 or +1 (or not specified and +1 implied) appears to make no difference to the size of code produced. But using the "incorrect" step sign relative to the "from" and "to" values will, of course, produce an "unintended" result.
However, I made the observation only as a possible explanation for the (new in #9) OP's claimed timing discrepancies between "up" and "down" loops. Unfortunately I'm not in a position to easily test a real PICaxe chip at the moment, and of course the simulator is of no use for timing tests.
Cheers, Alan.