8341923: java.util.Locale class specification improvements

Reviewed-by: liach, naoto
This commit is contained in:
Justin Lu 2024-12-04 20:57:26 +00:00
parent a72cab8c47
commit ee0f88c901

View file

@ -45,7 +45,7 @@ import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.ObjectStreamField;
import java.io.Serializable;
import java.text.DateFormat;
import java.text.NumberFormat;
import java.text.MessageFormat;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Function;
@ -70,131 +70,151 @@ import sun.util.locale.provider.LocaleServiceProviderPool;
import sun.util.locale.provider.TimeZoneNameUtility;
/**
* A {@code Locale} object represents a specific geographical, political,
* or cultural region. An operation that requires a {@code Locale} to perform
* its task is called <em>locale-sensitive</em> and uses the {@code Locale}
* to tailor information for the user. For example, displaying a number
* is a locale-sensitive operation&mdash; the number should be formatted
* according to the customs and conventions of the user's native country,
* region, or culture.
* A {@code Locale} represents a specific geographical, political,
* or cultural region. An API that requires a {@code Locale} to perform
* its task is <dfn>{@index "locale-sensitive"}</dfn> and uses the {@code Locale}
* to tailor information for the user. These <em>locale-sensitive</em> APIs
* are principally in the <i>java.text</i> and <i>java.util</i> packages.
* For example, displaying a number is a <em>locale-sensitive</em> operation&mdash;
* the number should be formatted according to the customs and conventions of the
* user's native country, region, or culture.
*
* <p>The {@code Locale} class implements IETF BCP 47 which is composed of
* <a href="https://tools.ietf.org/html/rfc4647">RFC 4647 "Matching of Language
* Tags"</a> and <a href="https://tools.ietf.org/html/rfc5646">RFC 5646 "Tags
* for Identifying Languages"</a> with support for the LDML (UTS#35, "Unicode
* Locale Data Markup Language") BCP 47-compatible extensions for locale data
* exchange.
* exchange. Each {@code Locale} is associated with locale data which is provided
* by the Java runtime environment or any deployed {@link
* java.util.spi.LocaleServiceProvider LocaleServiceProvider} implementations.
* The locale data provided by the Java runtime environment may vary by release.
*
* <p> A {@code Locale} object logically consists of the fields
* described below.
* <h2 id="loc_comp">Locale Composition</h2>
* <p> A {@code Locale} is composed of the bolded fields described below; note that a
* {@code Locale} need not have all such fields. For example, {@link
* Locale#ENGLISH Locale.ENGLISH} is only comprised of the <em>language</em> field.
* In contrast, a {@code Locale} such as the one returned by {@code
* Locale.forLanguageTag("en-Latn-US-POSIX-u-nu-latn")} would be comprised of all
* the fields below. This particular {@code Locale} would represent English in
* the United States using the Latin script and numerics for use in POSIX
* environments.
* <p>
* {@code Locale} implements IETF BCP 47 and any deviations should be observed
* by the comments prefixed by <em>"BCP 47 deviation:"</em>.
* <a href="https://tools.ietf.org/html/rfc5646">RFC 5646</a>
* combines subtags from various ISO (639, 3166, 15924) standards which are also
* included in the composition of {@code Locale}.
* Additionally, the full list of valid codes for each field can be found in the
* <a href="https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry">
* IANA Language Subtag Registry</a> (e.g. search for "Type: region").
*
* <dl>
* <dt><a id="def_language"><b>language</b></a></dt>
*
* <dd>ISO 639 alpha-2 or alpha-3 language code, or registered
* language subtags up to 8 alpha letters (for future enhancements).
* <dd> ISO 639 alpha-2/alpha-3 language code or a registered
* language subtag up to 8 alpha letters (for future enhancements).
* When a language has both an alpha-2 code and an alpha-3 code, the
* alpha-2 code must be used. You can find a full list of valid
* language codes in the IANA Language Subtag Registry (search for
* "Type: language"). The language field is case insensitive, but
* alpha-2 code must be used.</dd>
*
* <dd> <em>Case convention:</em> {@code language} is case insensitive, but
* {@code Locale} always canonicalizes to lower case.</dd>
*
* <dd>Well-formed language values have the form
* <code>[a-zA-Z]{2,8}</code>. Note that this is not the full
* BCP47 language production, since it excludes extlang. They are
* not needed since modern three-letter language codes replace
* them.</dd>
* <dd> <em>Syntax:</em> Well-formed {@code language} values have the form {@code [a-zA-Z]{2,8}}.</dd>
* <dd> <em> BCP 47 deviation:</em> this is not the full BCP 47 language production, since it excludes
* <a href="https://datatracker.ietf.org/doc/html/rfc5646#section-2.2.2">extlang</a>
* (as modern three-letter language codes are preferred).</dd>
*
* <dd>Example: "en" (English), "ja" (Japanese), "kok" (Konkani)</dd>
* <dd> <em>Example:</em> "en" (English), "ja" (Japanese), "kok" (Konkani)</dd>
*
* <dt><a id="def_script"><b>script</b></a></dt>
*
* <dd>ISO 15924 alpha-4 script code. You can find a full list of
* valid script codes in the IANA Language Subtag Registry (search
* for "Type: script"). The script field is case insensitive, but
* <dd> ISO 15924 alpha-4 script code.</dd>
*
* <dd> <em>Case convention:</em> {@code script} is case insensitive, but
* {@code Locale} always canonicalizes to title case (the first
* letter is upper case and the rest of the letters are lower
* case).</dd>
*
* <dd>Well-formed script values have the form
* <code>[a-zA-Z]{4}</code></dd>
* <dd> <em>Syntax:</em> Well-formed {@code script} values have the form {@code
* [a-zA-Z]{4}}</dd>
*
* <dd>Example: "Latn" (Latin), "Cyrl" (Cyrillic)</dd>
* <dd> <em>Example:</em> "Latn" (Latin), "Cyrl" (Cyrillic)</dd>
*
* <dt><a id="def_region"><b>country (region)</b></a></dt>
*
* <dd>ISO 3166 alpha-2 country code or UN M.49 numeric-3 area code.
* You can find a full list of valid country and region codes in the
* IANA Language Subtag Registry (search for "Type: region"). The
* country (region) field is case insensitive, but
* <dd> ISO 3166 alpha-2 country code or UN M.49 numeric-3 area code.</dd>
*
* <dd> <em>Case convention:</em> {@code country (region)} is case insensitive, but
* {@code Locale} always canonicalizes to upper case.</dd>
*
* <dd>Well-formed country/region values have
* the form <code>[a-zA-Z]{2} | [0-9]{3}</code></dd>
* <dd> <em>Syntax:</em> Well-formed {@code country (region)} values have the form {@code
* [a-zA-Z]{2} | [0-9]{3}}</dd>
*
* <dd>Example: "US" (United States), "FR" (France), "029"
* <dd> <em>Example:</em> "US" (United States), "FR" (France), "029"
* (Caribbean)</dd>
*
* <dt><a id="def_variant"><b>variant</b></a></dt>
*
* <dd> Any arbitrary value used to indicate a variation of a
* {@code Locale}. Where there are two or more variant values
* each indicating its own semantics, these values should be ordered
* by importance, with most important first, separated by
* underscore('_'). The variant field is case sensitive.</dd>
*
* <dd>Note: IETF BCP 47 places syntactic restrictions on variant
* subtags. Also BCP 47 subtags are strictly used to indicate
* {@code Locale}. When multiple variants exist, they should be separated by
* {@code ('_'|'-')}. Variants of higher importance should precede the others.</dd>
* <dd> <em>BCP 47 deviation:</em> BCP 47 subtags are strictly used to indicate
* additional variations that define a language or its dialects that
* are not covered by any combinations of language, script and
* region subtags. You can find a full list of valid variant codes
* in the IANA Language Subtag Registry (search for "Type: variant").
*
* <p>However, the variant field in {@code Locale} has
* region subtags. However, the variant field in {@code Locale} has
* historically been used for any kind of variation, not just
* language variations. For example, some supported variants
* available in Java SE Runtime Environments indicate alternative
* cultural behaviors such as calendar type or number script. In
* BCP 47 this kind of information, which does not identify the
* BCP 47, this kind of information which does not identify the
* language, is supported by extension subtags or private use
* subtags.</dd>
*
* <dd>Well-formed variant values have the form <code>SUBTAG
* (('_'|'-') SUBTAG)*</code> where <code>SUBTAG =
* [0-9][0-9a-zA-Z]{3} | [0-9a-zA-Z]{5,8}</code>. (Note: BCP 47 only
* uses hyphen ('-') as a delimiter, this is more lenient).</dd>
* <dd> <em>Case convention:</em> {@code variant} is case sensitive. BCP 47
* deviation: BCP 47 treats the variant field as case insensitive.</dd>
*
* <dd>Example: "polyton" (Polytonic Greek), "POSIX"</dd>
* <dd> <em>Syntax:</em> Well-formed {@code variant} values have the form {@code
* SUBTAG (('_'|'-') SUBTAG)*} where {@code SUBTAG =
* [0-9][0-9a-zA-Z]{3} | [0-9a-zA-Z]{5,8}}.</dd>
* <dd> <em>BCP 47 deviation:</em> BCP 47 only
* uses hyphen ('-') as a delimiter, {@code Locale} is more lenient.</dd>
*
* <dd> <em>Example:</em> "polyton" (Polytonic Greek), "POSIX"</dd>
*
* <dt><a id="def_extensions"><b>extensions</b></a></dt>
*
* <dd> A map from single character keys to string values, indicating
* extensions apart from language identification. The extensions in
* {@code Locale} implement the semantics and syntax of BCP 47
* extension subtags and private use subtags. The extensions are
* case insensitive, but {@code Locale} canonicalizes all
* extension keys and values to lower case. Note that extensions
* cannot have empty values.</dd>
* extensions apart from language identification.</dd>
* <dd> <em> BCP 47 deviation:</em> The {@code
* extensions} in {@code Locale} implement the semantics and syntax of BCP 47
* extension subtags <em>and</em> private use subtags. The {@code extensions}
* field cannot have empty values. </dd>
*
* <dd>Well-formed keys are single characters from the set
* <dd> <em>Case convention:</em> {@code extensions} are
* case insensitive, but {@code Locale} canonicalizes all
* extension keys and values to lower case.</dd>
*
* <dd> <em>Syntax:</em> Well-formed keys are single characters from the set
* {@code [0-9a-zA-Z]}. Well-formed values have the form
* {@code SUBTAG ('-' SUBTAG)*} where for the key 'x'
* <code>SUBTAG = [0-9a-zA-Z]{1,8}</code> and for other keys
* <code>SUBTAG = [0-9a-zA-Z]{2,8}</code> (that is, 'x' allows
* {@code SUBTAG = [0-9a-zA-Z]{1,8}} and for other keys
* {@code SUBTAG = [0-9a-zA-Z]{2,8}} (that is, 'x' allows
* single-character subtags).</dd>
*
* <dd>Example: key="u"/value="ca-japanese" (Japanese Calendar),
* <dd> <em>Example:</em> key="u"/value="ca-japanese" (Japanese Calendar),
* key="x"/value="java-1-7"</dd>
* </dl>
*
* <b>Note:</b> Although BCP 47 requires field values to be registered
* <b>BCP 47 deviation:</b> Although BCP 47 requires field values to be registered
* in the IANA Language Subtag Registry, the {@code Locale} class
* does not provide any validation features. The {@code Builder}
* does not validate this requirement. For example, the variant code <em>"foobar"</em>
* is well-formed since it is composed of 5 to 8 alphanumerics, but is not defined
* the IANA Language Subtag Registry. The {@link Builder}
* only checks if an individual field satisfies the syntactic
* requirement (is well-formed), but does not validate the value
* itself. See {@link Builder} for details.
* itself. Conversely, {@link #of(String, String, String) Locale::of} and its
* overloads do not make any syntactic checks on the input.
*
* <h2><a id="def_locale_extension">Unicode locale/language extension</a></h2>
* <h3><a id="def_locale_extension">Unicode BCP 47 U Extension</a></h3>
*
* <p>UTS#35, "Unicode Locale Data Markup Language" defines optional
* attributes and keywords to override or refine the default behavior
@ -213,7 +233,7 @@ import sun.util.locale.provider.TimeZoneNameUtility;
* String representing this information, for example, "nu-thai". The
* {@code Locale} class also provides {@link
* #getUnicodeLocaleAttributes}, {@link #getUnicodeLocaleKeys}, and
* {@link #getUnicodeLocaleType} which allow you to access Unicode
* {@link #getUnicodeLocaleType(String)} which provides access to the Unicode
* locale attributes and key/type pairs directly. When represented as
* a string, the Unicode Locale Extension lists attributes
* alphabetically, followed by key/type sequences with keys listed
@ -221,11 +241,11 @@ import sun.util.locale.provider.TimeZoneNameUtility;
* fixed when the type is defined)
*
* <p>A well-formed locale key has the form
* <code>[0-9a-zA-Z]{2}</code>. A well-formed locale type has the
* form <code>"" | [0-9a-zA-Z]{3,8} ('-' [0-9a-zA-Z]{3,8})*</code> (it
* {@code [0-9a-zA-Z]{2}}. A well-formed locale type has the
* form {@code "" | [0-9a-zA-Z]{3,8} ('-' [0-9a-zA-Z]{3,8})*} (it
* can be empty, or a series of subtags 3-8 alphanums in length). A
* well-formed locale attribute has the form
* <code>[0-9a-zA-Z]{3,8}</code> (it is a single subtag with the same
* {@code [0-9a-zA-Z]{3,8}} (it is a single subtag with the same
* form as a locale type subtag).
*
* <p>The Unicode locale extension specifies optional behavior in
@ -234,36 +254,11 @@ import sun.util.locale.provider.TimeZoneNameUtility;
* implementations in a Java Runtime Environment might not support any
* particular Unicode locale attributes or key/type pairs.
*
* <h3><a id="ObtainingLocale">Obtaining a Locale</a></h3>
* <h2><a id="default_locale">Default Locale</a></h2>
*
* <p>There are several ways to obtain a {@code Locale}
* object.
*
* <h4>Builder</h4>
*
* <p>Using {@link Builder} you can construct a {@code Locale} object
* that conforms to BCP 47 syntax.
*
* <h4>Factory Methods</h4>
*
* <p>The method {@link #forLanguageTag} obtains a {@code Locale}
* object for a well-formed BCP 47 language tag. The method
* {@link #of(String, String, String)} and its overloads obtain a
* {@code Locale} object from given {@code language}, {@code country},
* and/or {@code variant} defined above.
*
* <h4>Locale Constants</h4>
*
* <p>The {@code Locale} class provides a number of convenient constants
* that you can use to obtain {@code Locale} objects for commonly used
* locales. For example, {@code Locale.US} is the {@code Locale} object
* for the United States.
*
* <h3><a id="default_locale">Default Locale</a></h3>
*
* <p>The default Locale is provided for any locale-sensitive methods if no
* <p>The default Locale is provided for any <em>locale-sensitive</em> methods if no
* {@code Locale} is explicitly specified as an argument, such as
* {@link DateFormat#getInstance()}. The default Locale is determined at startup
* {@link NumberFormat#getInstance()}. The default Locale is determined at startup
* of the Java runtime and established in the following three phases:
* <ol>
* <li>The locale-related system properties listed below are established from the
@ -316,6 +311,7 @@ import sun.util.locale.provider.TimeZoneNameUtility;
* system properties are not altered. It is not recommended that applications read
* these system properties and parse or interpret them as their values may be out of date.
*
* <h3>Locale Category</h3>
* <p>There are finer-grained default Locales specific for each {@link Locale.Category}.
* These category specific default Locales can be queried by {@link #getDefault(Category)},
* and set by {@link #setDefault(Category, Locale)}. Construction of these category
@ -327,19 +323,87 @@ import sun.util.locale.provider.TimeZoneNameUtility;
* category. In the absence of category specific system properties, the "category-less"
* system properties are used, such as {@code user.language} in the previous example.
*
* <h3><a id="LocaleMatching">Locale Matching</a></h3>
* <h2><a id="ObtainingLocale">Obtaining a Locale</a></h2>
*
* <p>If an application or a system is internationalized and provides localized
* <p>There are several ways to obtain a {@code Locale} object.
* It is advised against using the deprecated {@code Locale} constructors.
*
* <dl>
* <dt><b>Locale Constants</b></dt>
* <dd>A number of convenient constants are provided that return {@code Locale}
* objects for commonly used locales. For example, {@link #US Locale.US} is the
* {@code Locale} object for the United States.</dd>
* <dt><b>Factory Methods</b></dt>
* <dd>{@link #of(String, String, String) Locale::of} and its overloads obtain a
* {@code Locale} object from the given {@code language}, {@code country},
* and/or {@code variant}. {@link #forLanguageTag(String)} obtains a {@code Locale}
* object for a well-formed BCP 47 language tag.</dd>
* <dt><b>Builder</b></dt>
* <dd>{@link Builder} is used to construct a {@code Locale} object that conforms
* to BCP 47 syntax. Use a builder to enforce syntactic restrictions on the input.</dd>
* </dl>
* <p>The following invocations produce Locale objects that are all equivalent:
* {@snippet lang=java :
* Locale.US;
* Locale.of("en", "US");
* Locale.forLanguageTag("en-US");
* new Locale.Builder().setLanguage("en").setRegion("US").build();
* }
*
* <h2>Usage Examples</h2>
*
* <p>Once a {@code Locale} is {@linkplain ##ObtainingLocale obtained},
* it can be queried for information about itself. For example, use {@link
* #getCountry} to get the country (or region) code and {@link #getLanguage} to
* get the language. {@link #getDisplayCountry} can be used to get the
* name of the country suitable for displaying to the user. Similarly,
* use {@link #getDisplayLanguage()} to get the name of
* the language suitable for displaying to the user. The {@code getDisplayXXX}
* methods are themselves <em>locale-sensitive</em> and have two variants; one with an explicit
* locale parameter, and one without. The latter uses the default {@link
* Locale.Category#DISPLAY DISPLAY} locale, so the following are equivalent :
* {@snippet lang=java :
* Locale.getDefault().getDisplayCountry();
* Locale.getDefault().getDisplayCountry(Locale.getDefault(Locale.Category.DISPLAY));
* }
*
* <p>The Java Platform provides a number of classes that perform locale-sensitive
* operations. For example, the {@code NumberFormat} class formats
* numbers, currency, and percentages in a <em>locale-sensitive</em> manner. Classes such
* as {@code NumberFormat} have several factory methods for creating a default object
* of that type. These methods generally have two variants; one with an explicit
* locale parameter, and one without. The latter uses the default {@link
* Locale.Category#FORMAT FORMAT} locale, so the following are equivalent :
* {@snippet lang=java :
* NumberFormat.getCurrencyInstance();
* NumberFormat.getCurrencyInstance(Locale.getDefault(Locale.Category.FORMAT));
* }
*
* <p>
* The following example demonstrates <em>locale-sensitive</em> currency and
* date related operations under different locales :
* {@snippet lang = java:
* var number = 1000;
* NumberFormat.getCurrencyInstance(Locale.US).format(number); // returns "$1,000.00"
* NumberFormat.getCurrencyInstance(Locale.JAPAN).format(number); // returns "\u00A51,000""
* var date = LocalDate.of(2024, 1, 1);
* DateTimeFormatter.ofLocalizedDate(FormatStyle.LONG).localizedBy(Locale.US).format(date); // returns "January 1, 2024"
* DateTimeFormatter.ofLocalizedDate(FormatStyle.LONG).localizedBy(Locale.JAPAN).format(date); // returns "2024\u5e741\u67081\u65e5"
* }
*
* <h2><a id="LocaleMatching">Locale Matching</a></h2>
*
* <p>If an application is internationalized and provides localized
* resources for multiple locales, it sometimes needs to find one or more
* locales (or language tags) which meet each user's specific preferences. Note
* that a term "language tag" is used interchangeably with "locale" in this
* locale matching documentation.
* that the term "<dfn>{@index "language tag"}</dfn>" is used interchangeably
* with "locale" in the following locale matching documentation.
*
* <p>In order to do matching a user's preferred locales to a set of language
* <p>In order to match a user's preferred locales to a set of language
* tags, <a href="https://tools.ietf.org/html/rfc4647">RFC 4647 Matching of
* Language Tags</a> defines two mechanisms: filtering and lookup.
* <em>Filtering</em> is used to get all matching locales, whereas
* <em>lookup</em> is to choose the best matching locale.
* <em>lookup</em> is to select the best matching locale.
* Matching is done case-insensitively. These matching mechanisms are described
* in the following sections.
*
@ -348,7 +412,8 @@ import sun.util.locale.provider.TimeZoneNameUtility;
* language ranges: basic and extended. See
* {@link Locale.LanguageRange Locale.LanguageRange} for details.
*
* <h4>Filtering</h4>
*
* <h3>Filtering</h3>
*
* <p>The filtering operation returns all matching language tags. It is defined
* in RFC 4647 as follows:
@ -366,7 +431,7 @@ import sun.util.locale.provider.TimeZoneNameUtility;
* {@link Locale.FilteringMode} is a parameter to specify how filtering should
* be done.
*
* <h4>Lookup</h4>
* <h3>Lookup</h3>
*
* <p>The lookup operation returns the best matching language tags. It is
* defined in RFC 4647 as follows:
@ -398,76 +463,50 @@ import sun.util.locale.provider.TimeZoneNameUtility;
* an {@link Iterator} over a {@link Collection} of language tags is treated as
* the best matching one.
*
* <h3>Use of Locale</h3>
* <h3>Serialization</h3>
*
* <p>Once you've obtained a {@code Locale} you can query it for information
* about itself. Use {@code getCountry} to get the country (or region)
* code and {@code getLanguage} to get the language code.
* You can use {@code getDisplayCountry} to get the
* name of the country suitable for displaying to the user. Similarly,
* you can use {@code getDisplayLanguage} to get the name of
* the language suitable for displaying to the user. Interestingly,
* the {@code getDisplayXXX} methods are themselves locale-sensitive
* and have two versions: one that uses the default
* {@link Locale.Category#DISPLAY DISPLAY} locale and one
* that uses the locale specified as an argument.
* <p>During serialization, writeObject writes all fields to the output
* stream, including extensions.
*
* <p>The Java Platform provides a number of classes that perform locale-sensitive
* operations. For example, the {@code NumberFormat} class formats
* numbers, currency, and percentages in a locale-sensitive manner. Classes
* such as {@code NumberFormat} have several convenience methods
* for creating a default object of that type. For example, the
* {@code NumberFormat} class provides these three convenience methods
* for creating a default {@code NumberFormat} object:
* {@snippet lang=java :
* NumberFormat.getInstance();
* NumberFormat.getCurrencyInstance();
* NumberFormat.getPercentInstance();
* }
* Each of these methods has two variants; one with an explicit locale
* and one without; the latter uses the default
* {@link Locale.Category#FORMAT FORMAT} locale:
* {@snippet lang=java :
* NumberFormat.getInstance(myLocale);
* NumberFormat.getCurrencyInstance(myLocale);
* NumberFormat.getPercentInstance(myLocale);
* }
* A {@code Locale} is the mechanism for identifying the kind of object
* ({@code NumberFormat}) that you would like to get. The locale is
* <STRONG>just</STRONG> a mechanism for identifying objects,
* <STRONG>not</STRONG> a container for the objects themselves.
* <p>During deserialization, readResolve adds extensions as described
* in {@linkplain ##special_cases_constructor Special Cases}, only
* for the two cases th_TH_TH and ja_JP_JP.
*
* <h3>Compatibility</h3>
*
* <p>In order to maintain compatibility, Locale's
* constructors retain their behavior prior to the Java Runtime
* Environment version 1.7. The same is largely true for the
* {@code toString} method. Thus Locale objects can continue to
* be used as they were. In particular, clients who parse the output
* of toString into language, country, and variant fields can continue
* to do so (although this is strongly discouraged), although the
* variant field will have additional information in it if script or
* extensions are present.
* @implNote
* <h2>Compatibility</h2>
* <p> The following commentary is provided for apps that want to ensure
* interoperability with older releases of {@code Locale} provided by the
* reference implementation.
* <h3><a id="locale_behavior">Locale Behavior</a></h3>
* In order to maintain compatibility, Locale's (deprecated) constructors,
* {@link #of(String, String, String)}, and its overloads retain their behavior prior to the Java Runtime
* Environment version 1.7. That is, a length constraint is not imposed on any of
* the input parameters. Similarly, the same preservation of past behavior is largely true
* for the {@link #toString()} method.
* Apps that previously parsed the output of {@link #toString()} into language,
* country, and variant fields can continue to do so (although this is strongly
* discouraged). A caveat is that the variant field will have additional
* information in it if script or extensions are present.
*
* <p>In addition, BCP 47 imposes syntax restrictions that are not
* imposed by Locale's constructors. This means that conversions
* between some Locales and BCP 47 language tags cannot be made without
* losing information. Thus {@code toLanguageTag} cannot
* losing information. Thus {@link #toLanguageTag} cannot
* represent the state of locales whose language, country, or variant
* do not conform to BCP 47.
*
* <p>Because of these issues, it is recommended that clients migrate
* <p>Because of these issues, it is recommended that apps migrate
* away from constructing non-conforming locales and use the
* {@code forLanguageTag} and {@code Locale.Builder} APIs instead.
* Clients desiring a string representation of the complete locale can
* then always rely on {@code toLanguageTag} for this purpose.
* {@link #forLanguageTag(String)} and {@link Locale.Builder} APIs instead.
* Apps desiring a string representation of the complete locale can
* then always rely on {@link #toLanguageTag} for this purpose.
*
* <h4><a id="special_cases_constructor">Special cases</a></h4>
* <h3><a id="special_cases_constructor">Special cases</a></h3>
*
* <p>For compatibility reasons, two
* non-conforming locales are treated as special cases. These are
* <b>{@code ja_JP_JP}</b> and <b>{@code th_TH_TH}</b>. These are ill-formed
* in BCP 47 since the variants are too short. To ease migration to BCP 47,
* in BCP 47 since the {@linkplain ##def_variant variants} are too short. To ease migration to BCP 47,
* these are treated specially during construction. These two cases (and only
* these) cause a constructor to generate an extension, all other values behave
* exactly as they did prior to Java 7.
@ -487,25 +526,16 @@ import sun.util.locale.provider.TimeZoneNameUtility;
* constructor is called with the arguments "th", "TH", "TH", the
* extension "u-nu-thai" is automatically added.
*
* <h4>Serialization</h4>
* <h3><a id="legacy_language_codes">Legacy language codes</a></h3>
*
* <p>During serialization, writeObject writes all fields to the output
* stream, including extensions.
*
* <p>During deserialization, readResolve adds extensions as described
* in {@linkplain ##special_cases_constructor Special Cases}, only
* for the two cases th_TH_TH and ja_JP_JP.
*
* <h4><a id="legacy_language_codes">Legacy language codes</a></h4>
*
* <p>Locale's constructor has always converted three language codes to
* <p>Locale's constructors have always converted three language codes to
* their earlier, obsoleted forms: {@code he} maps to {@code iw},
* {@code yi} maps to {@code ji}, and {@code id} maps to
* {@code in}. Since Java SE 17, this is no longer the case. Each
* language maps to its new form; {@code iw} maps to {@code he}, {@code ji}
* maps to {@code yi}, and {@code in} maps to {@code id}.
*
* <p>For the backward compatible behavior, the system property
* <p>For backwards compatible behavior, the system property
* {@systemProperty java.locale.useOldISOCodes} reverts the behavior
* back to that of before Java SE 17. If the system property is set to
* {@code true}, those three current language codes are mapped to their
@ -524,22 +554,12 @@ import sun.util.locale.provider.TimeZoneNameUtility;
* lookup mechanism also implements this mapping, so that resources
* can be named using either convention, see {@link ResourceBundle.Control}.
*
* <h4>Three-letter language/country(region) codes</h4>
*
* <p>The Locale constructors have always specified that the language
* and the country param be two characters in length, although in
* practice they have accepted any length. The specification has now
* been relaxed to allow language codes of two to eight characters and
* country (region) codes of two to three characters, and in
* particular, three-letter language codes and three-digit region
* codes as specified in the IANA Language Subtag Registry. For
* compatibility, the implementation still does not impose a length
* constraint.
*
* @spec https://www.rfc-editor.org/info/rfc4647
* RFC 4647: Matching of Language Tags
* @spec https://www.rfc-editor.org/info/rfc5646
* RFC 5646: Tags for Identifying Languages
* @spec https://unicode.org/reports/tr35/
* Unicode Locale Data Markup Language
* @see Builder
* @see ResourceBundle
* @see java.text.Format
@ -1229,28 +1249,28 @@ public final class Locale implements Cloneable, Serializable {
}
/**
* {@return an array of installed locales}
* {@return an array of available locales}
*
* The returned array represents the union of locales supported
* by the Java runtime environment and by installed
* by the Java runtime environment and by deployed
* {@link java.util.spi.LocaleServiceProvider LocaleServiceProvider}
* implementations. At a minimum, the returned array must contain a
* {@code Locale} instance equal to {@link Locale#ROOT Locale.ROOT} and
* a {@code Locale} instance equal to {@link Locale#US Locale.US}.
* {@code Locale} instance equal to {@link #ROOT Locale.ROOT} and
* a {@code Locale} instance equal to {@link #US Locale.US}.
*/
public static Locale[] getAvailableLocales() {
return LocaleServiceProviderPool.getAllAvailableLocales();
}
/**
* {@return a stream of installed locales}
* {@return a stream of available locales}
*
* The returned stream represents the union of locales supported
* by the Java runtime environment and by installed
* by the Java runtime environment and by deployed
* {@link java.util.spi.LocaleServiceProvider LocaleServiceProvider}
* implementations. At a minimum, the returned stream must contain a
* {@code Locale} instance equal to {@link Locale#ROOT Locale.ROOT} and
* a {@code Locale} instance equal to {@link Locale#US Locale.US}.
* {@code Locale} instance equal to {@link #ROOT Locale.ROOT} and
* a {@code Locale} instance equal to {@link #US Locale.US}.
*
* @implNote Unlike {@code getAvailableLocales()}, this method does
* not create a defensive copy of the Locale array.
@ -1532,8 +1552,8 @@ public final class Locale implements Cloneable, Serializable {
* Java 6 and prior.
*
* <p>If both the language and country fields are missing, this function will return
* the empty string, even if the variant, script, or extensions field is present (you
* can't have a locale with just a variant, the variant must accompany a well-formed
* the empty string, even if the variant, script, or extensions field is present
* (a locale with just a variant is not allowed, the variant must accompany a well-formed
* language or country code).
*
* <p>If script or extensions are present and variant is missing, no underscore is
@ -1613,7 +1633,7 @@ public final class Locale implements Cloneable, Serializable {
* (delimited by '-' or '_') is emitted as a subtag. Otherwise:
* <ul>
*
* <li>if all sub-segments match <code>[0-9a-zA-Z]{1,8}</code>
* <li>if all sub-segments match {@code [0-9a-zA-Z]{1,8}}
* (for example "WIN" or "Oracle_JDK_Standard_Edition"), the first
* ill-formed sub-segment and all following will be appended to
* the private use subtag. The first appended subtag will be
@ -1622,7 +1642,7 @@ public final class Locale implements Cloneable, Serializable {
* "Oracle-x-lvariant-JDK-Standard-Edition".
*
* <li>if any sub-segment does not match
* <code>[0-9a-zA-Z]{1,8}</code>, the variant will be truncated
* {@code [0-9a-zA-Z]{1,8}}, the variant will be truncated
* and the problematic sub-segment and all following sub-segments
* will be omitted. If the remainder is non-empty, it will be
* emitted as a private use subtag as above (even if the remainder
@ -1774,7 +1794,7 @@ public final class Locale implements Cloneable, Serializable {
*
* <p>If the specified language tag contains any ill-formed subtags,
* the first such subtag and all following subtags are ignored. Compare
* to {@link Locale.Builder#setLanguageTag} which throws an exception
* to {@link Locale.Builder#setLanguageTag(String)} which throws an exception
* in this case.
*
* <p>The following <b>conversions</b> are performed:<ul>
@ -1994,7 +2014,6 @@ public final class Locale implements Cloneable, Serializable {
* getDisplayLanguage() will return "anglais".
* If the name returned cannot be localized for the default
* {@link Locale.Category#DISPLAY DISPLAY} locale,
* (say, we don't have a Japanese name for Croatian),
* this function falls back on the English name, and uses the ISO code as a last-resort
* value. If the locale doesn't specify a language, this function returns the empty string.
*
@ -2012,7 +2031,6 @@ public final class Locale implements Cloneable, Serializable {
* is en_US, getDisplayLanguage() will return "French"; if the locale is en_US and
* inLocale is fr_FR, getDisplayLanguage() will return "anglais".
* If the name returned cannot be localized according to inLocale,
* (say, we don't have a Japanese name for Croatian),
* this function falls back on the English name, and finally
* on the ISO code as a last-resort value. If the locale doesn't specify a language,
* this function returns the empty string.
@ -2067,7 +2085,6 @@ public final class Locale implements Cloneable, Serializable {
* getDisplayCountry() will return "Etats-Unis".
* If the name returned cannot be localized for the default
* {@link Locale.Category#DISPLAY DISPLAY} locale,
* (say, we don't have a Japanese name for Croatia),
* this function falls back on the English name, and uses the ISO code as a last-resort
* value. If the locale doesn't specify a country, this function returns the empty string.
*
@ -2084,8 +2101,7 @@ public final class Locale implements Cloneable, Serializable {
* For example, if the locale is fr_FR and inLocale
* is en_US, getDisplayCountry() will return "France"; if the locale is en_US and
* inLocale is fr_FR, getDisplayCountry() will return "Etats-Unis".
* If the name returned cannot be localized according to inLocale.
* (say, we don't have a Japanese name for Croatia),
* If the name returned cannot be localized according to inLocale,
* this function falls back on the English name, and finally
* on the ISO code as a last-resort value. If the locale doesn't specify a country,
* this function returns the empty string.
@ -2697,7 +2713,7 @@ public final class Locale implements Cloneable, Serializable {
* alphanumerics. The method {@code setVariant} throws
* {@code IllformedLocaleException} for a variant that does not satisfy
* this restriction. If it is necessary to support such a variant, use
* {@link Locale#of(String, String, String)}. However, keep in mind that a {@code Locale}
* {@link #of(String, String, String)}. However, keep in mind that a {@code Locale}
* object obtained this way might lose the variant information when
* transformed to a BCP 47 language tag.
*
@ -2710,7 +2726,7 @@ public final class Locale implements Cloneable, Serializable {
* <p>Builders can be reused; {@code clear()} resets all
* fields to their default values.
*
* @see Locale#forLanguageTag
* @see Locale#forLanguageTag(String)
* @see Locale#of(String, String, String)
* @since 1.7
*/
@ -2760,7 +2776,7 @@ public final class Locale implements Cloneable, Serializable {
* language tag. Discards the existing state. Null and the
* empty string cause the builder to be reset, like {@link
* #clear}. Legacy tags (see {@link
* Locale#forLanguageTag}) are converted to their canonical
* Locale#forLanguageTag(String)}) are converted to their canonical
* form before being processed. Otherwise, the language tag
* must be well-formed (see {@link Locale}) or an exception is
* thrown (unlike {@code Locale.forLanguageTag}, which
@ -2862,7 +2878,7 @@ public final class Locale implements Cloneable, Serializable {
* the {@code Locale} class does not impose any syntactic
* restriction on variant, and the variant value in
* {@code Locale} is case sensitive. To set such a variant,
* use {@link Locale#of(String, String, String)}.
* use {@link #of(String, String, String)}.
*
* @param variant the variant
* @return This builder.
@ -2884,12 +2900,12 @@ public final class Locale implements Cloneable, Serializable {
* must be {@linkplain Locale##def_extensions well-formed} or an exception
* is thrown.
*
* <p><b>Note:</b> The key {@link Locale#UNICODE_LOCALE_EXTENSION
* <p><b>Note:</b> The key {@link #UNICODE_LOCALE_EXTENSION
* UNICODE_LOCALE_EXTENSION} ('u') is used for the Unicode locale extension.
* Setting a value for this key replaces any existing Unicode locale key/type
* pairs with those defined in the extension.
*
* <p><b>Note:</b> The key {@link Locale#PRIVATE_USE_EXTENSION
* <p><b>Note:</b> The key {@link #PRIVATE_USE_EXTENSION
* PRIVATE_USE_EXTENSION} ('x') is used for the private use code. To be
* well-formed, the value for this key needs only to have subtags of one to
* eight alphanumeric characters, not two to eight as in the general case.
@ -2918,7 +2934,7 @@ public final class Locale implements Cloneable, Serializable {
*
* <p>Keys and types are converted to lower case.
*
* <p><b>Note</b>:Setting the 'u' extension via {@link #setExtension}
* <p><b>Note</b>:Setting the 'u' extension via {@link #setExtension(char, String)}
* replaces all Unicode locale keywords with those defined in the
* extension.
*
@ -3010,9 +3026,9 @@ public final class Locale implements Cloneable, Serializable {
* Returns an instance of {@code Locale} obtained from the fields set
* on this builder.
*
* <p>This applies the conversions listed in {@link Locale#forLanguageTag}
* <p>This applies the conversions listed in {@link #forLanguageTag(String)}
* when constructing a Locale. (Legacy tags are handled in
* {@link #setLanguageTag}.)
* {@link #setLanguageTag(String)}.)
*
* @return A Locale.
*/
@ -3193,10 +3209,10 @@ public final class Locale implements Cloneable, Serializable {
*
* @spec https://www.rfc-editor.org/info/rfc4234 RFC 4234: Augmented BNF for Syntax Specifications: ABNF
* @spec https://www.rfc-editor.org/info/rfc4647 RFC 4647: Matching of Language Tags
* @see #filter
* @see #filterTags
* @see #lookup
* @see #lookupTag
* @see #filter(List, Collection, FilteringMode)
* @see #filterTags(List, Collection, FilteringMode)
* @see #lookup(List, Collection)
* @see #lookupTag(List, Collection)
*
* @since 1.8
*/
@ -3413,7 +3429,7 @@ public final class Locale implements Cloneable, Serializable {
* found in the given {@code ranges} is ill-formed
* @spec https://www.rfc-editor.org/info/rfc2616 RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1
* @see #parse(String)
* @see #mapEquivalents
* @see #mapEquivalents(List, Map)
*/
public static List<LanguageRange> parse(String ranges,
Map<String, List<String>> map) {