[ruby/prism] Avoid breaking code units offset on binary encoding

25a4cf6794 Co-authored-by: Kevin Newton <kddnewton@users.noreply.github.com>
2025-09-15 08:33:58 +02:00 · 2024-10-08 10:47:08 -04:00 · 2024-10-08 10:47:08 -04:00 · e50754fcfa
commit e50754fcfa
parent 615a087216
2 changed files with 20 additions and 1 deletions
--- a/lib/prism/parse_result.rb
+++ b/lib/prism/parse_result.rb
@ -90,7 +90,7 @@ module Prism
    # concept of code units that differs from the number of characters in other
    # encodings, it is not captured here.
    def code_units_offset(byte_offset, encoding)
-      byteslice = (source.byteslice(0, byte_offset) or raise).encode(encoding)
+      byteslice = (source.byteslice(0, byte_offset) or raise).encode(encoding, invalid: :replace, undef: :replace)

      if encoding == Encoding::UTF_16LE || encoding == Encoding::UTF_16BE
        byteslice.bytesize / 2