discovered with include-what-you-use
When used across large buffers an iterative reverse scheme is lesss efficient. The small cache cost for single reverses is good enough to just eat here.