Similar to the fast, specialized mb_strcut implementation for UTF-8
in 1f0cf133db, this new implementation of mb_strcut for UTF-16 strings
just examines a few bytes before each cut point.
Even for short strings, the new implementation is around 2x faster.
For strings around 10,000 bytes in length, it comes out about 100-500x
faster in my microbenchmarks.
The new implementation behaves identically to the old one on valid
UTF-16 strings; a fuzzer was used to help verify this.