Rather than use 4-16 separate operations, use 2 operations plus some byte reordering as necessary. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>