code which needs this function tends to require a fairly simple implementation inline for the optimiser to reach more successfully. we tended to generate function calls to this which slowed this inner loops.