ruby/doc/case_mapping.rdoc
2025-07-04 11:42:29 -04:00

106 lines
3 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

= Case Mapping
Some string-oriented methods use case mapping.
In String:
- String#capitalize
- String#capitalize!
- String#casecmp
- String#casecmp?
- String#downcase
- String#downcase!
- String#swapcase
- String#swapcase!
- String#upcase
- String#upcase!
In Symbol:
- Symbol#capitalize
- Symbol#casecmp
- Symbol#casecmp?
- Symbol#downcase
- Symbol#swapcase
- Symbol#upcase
== Default Case Mapping
By default, all of these methods use full Unicode case mapping,
which is suitable for most languages.
See {Section 3.13 (Default Case Algorithms) of the Unicode standard}[https://www.unicode.org/versions/latest/ch03.pdf].
Non-ASCII case mapping and folding are supported for UTF-8,
UTF-16BE/LE, UTF-32BE/LE, and ISO-8859-1~16 Strings/Symbols.
Context-dependent case mapping as described in
{Table 3-17 (Context Specification for Casing) of the Unicode standard}[https://www.unicode.org/versions/latest/ch03.pdf]
is currently not supported.
In most cases, the case conversion of a string has the same number of characters as before.
There are exceptions (see also +:fold+ below):
s = "\u00DF" # => "ß"
s.upcase # => "SS"
s = "\u0149" # => "ʼn"
s.upcase # => "ʼN"
Case mapping may also depend on locale (see also +:turkic+ below):
s = "\u0049" # => "I"
s.downcase # => "i" # Dot above.
s.downcase(:turkic) # => "ı" # No dot above.
Case changes may not be reversible:
s = 'Hello World!' # => "Hello World!"
s.downcase # => "hello world!"
s.downcase.upcase # => "HELLO WORLD!" # Different from original s.
Case changing methods may not maintain Unicode normalization.
See String#unicode_normalize.
== Case Mappings
Except for +casecmp+ and +casecmp?+,
each of the case-mapping methods listed above
accepts an optional argument, <tt>mapping</tt>.
The argument is one of:
- +:ascii+: ASCII-only mapping.
Uppercase letters ('A'..'Z') are mapped to lowercase letters ('a'..'z);
other characters are not changed
s = "Foo \u00D8 \u00F8 Bar" # => "Foo Ø ø Bar"
s.upcase # => "FOO Ø Ø BAR"
s.downcase # => "foo ø ø bar"
s.upcase(:ascii) # => "FOO Ø ø BAR"
s.downcase(:ascii) # => "foo Ø ø bar"
- +:turkic+: Full Unicode case mapping.
For the Turkic languages
that distinguish dotted and dotless I, for example Turkish and Azeri.
s = 'Türkiye' # => "Türkiye"
s.upcase # => "TÜRKIYE"
s.upcase(:turkic) # => "TÜRKİYE" # Dot above.
s = 'TÜRKIYE' # => "TÜRKIYE"
s.downcase # => "türkiye"
s.downcase(:turkic) # => "türkıye" # No dot above.
- +:fold+ (available only for String#downcase, String#downcase!,
and Symbol#downcase).
Unicode case folding,
which is more far-reaching than Unicode case mapping.
s = "\u00DF" # => "ß"
s.downcase # => "ß"
s.downcase(:fold) # => "ss"
s.upcase # => "SS"
s = "\uFB04" # => "ffl"
s.downcase # => "ffl"
s.upcase # => "FFL"
s.downcase(:fold) # => "ffl"