One could write an interpreter for PICAXE tokenised code in PICAXE Basic but it would be awfully slow. It would be possible to write an interpreter which used a modified version of the tokenised code which was more friendly for execution speed.
In the sense of how the ZX80's, Ataris and so on did it, by storing the source ( or a compacted version of it ) and interpreted that, then it's possible but involved. It would probably need a 28X1 to deliver the capabilities of an 08.
As to the fastest Basic interpreter, there are so many factors that it makes comparison hard. The main are tokenised code format, native microcontroller instruction set and CPU hardware.
PICAXE's have bit-aligned, varying length tokens which maximises code space ( generally, but some design decisions undermined that in places, but back on track for the X1 ); that increases the time to fetch, decode and execute tokens. A different tokenised format would improve execution speed but reduce maximum program size, so one cannot compare like with like. I believe that using tokens in multiples of 4-bits and careful design would allow programs of nearly the same size and improve execution speed.
For fastest on a PICmicro executing the PICAXE tokenised code without change, I believe Rev-Ed are quite close to the maximum they could be. Without the constraints of Firmware size, execution speed can, I believe, be improved further and there are tricks which can optimise particular things, especially RETURN time.
If the PICAXE used hardware which included an ALU 'barrel shifter' that would give a very noticeable speed improvement. The X2's using 18F's will bring speed improvements through a better native instruction set, something based on the dsPIC30F would zing along.
Optimising Firmware may save a few cycles here or there but the big gains come from simply running the chip faster. The X2 can run ten times faster than a PICAXE at 4MHz, and another way to look at that is, a PICAXE command takes only 25 times longer than a native instruction would to do the same job at 4MHz.