For determining critical timing, I use PulsOut to a spare pin and an oscilloscope. You need to determine the time taken to execute the Pulsout as well, so that it can be subtracted from the displayed time.
If using this method, you need to know that the tokenised code does not fit neatly into byte boundaries, so often 'wraps' from one byte to the next. This causes small variations in the execution speed, depending on where a command was stored.