I ran into the same issue, and solved it with the pulsout command as described. (Talk about re-inventing the wheel
Another option could be to use a 28X or 40X that can be overclocked. If you can live with the few restrictions that overclocking introduces, then running at 16 MHz instead of 4 MHz will reduce a "pause 1" to 0.25ms. Even faster, a "high 0" (or some other pin) takes around 0.06ms according to my measurements.
Wolfgang