[ruby/prism] Avoid breaking code units offset on binary encoding

25a4cf6794

Co-authored-by: Kevin Newton <kddnewton@users.noreply.github.com>
This commit is contained in:
Vinicius Stock 2024-10-08 10:47:08 -04:00 committed by git
parent 615a087216
commit e50754fcfa
2 changed files with 20 additions and 1 deletions

View file

@ -90,7 +90,7 @@ module Prism
# concept of code units that differs from the number of characters in other
# encodings, it is not captured here.
def code_units_offset(byte_offset, encoding)
byteslice = (source.byteslice(0, byte_offset) or raise).encode(encoding)
byteslice = (source.byteslice(0, byte_offset) or raise).encode(encoding, invalid: :replace, undef: :replace)
if encoding == Encoding::UTF_16LE || encoding == Encoding::UTF_16BE
byteslice.bytesize / 2