diff --git a/jdk/src/share/classes/java/nio/charset/Charset.java b/jdk/src/share/classes/java/nio/charset/Charset.java index 13723432712..4c166d519a2 100644 --- a/jdk/src/share/classes/java/nio/charset/Charset.java +++ b/jdk/src/share/classes/java/nio/charset/Charset.java @@ -212,36 +212,47 @@ import sun.security.action.GetPropertyAction; * *

Terminology

* - *

The name of this class is taken from the terms used in RFC 2278. In that - * document a charset is defined as the combination of a coded character - * set and a character-encoding scheme. + *

The name of this class is taken from the terms used in + * RFC 2278. + * In that document a charset is defined as the combination of + * one or more coded character sets and a character-encoding scheme. + * (This definition is confusing; some other software systems define + * charset as a synonym for coded character set.) * *

A coded character set is a mapping between a set of abstract * characters and a set of integers. US-ASCII, ISO 8859-1, - * JIS X 0201, and full Unicode, which is the same as - * ISO 10646-1, are examples of coded character sets. + * JIS X 0201, and Unicode are examples of coded character sets. * - *

A character-encoding scheme is a mapping between a coded - * character set and a set of octet (eight-bit byte) sequences. UTF-8, UCS-2, - * UTF-16, ISO 2022, and EUC are examples of character-encoding schemes. - * Encoding schemes are often associated with a particular coded character set; - * UTF-8, for example, is used only to encode Unicode. Some schemes, however, - * are associated with multiple character sets; EUC, for example, can be used - * to encode characters in a variety of Asian character sets. + *

Some standards have defined a character set to be simply a + * set of abstract characters without an associated assigned numbering. + * An alphabet is an example of such a character set. However, the subtle + * distinction between character set and coded character set + * is rarely used in practice; the former has become a short form for the + * latter, including in the Java API specification. + * + *

A character-encoding scheme is a mapping between one or more + * coded character sets and a set of octet (eight-bit byte) sequences. + * UTF-8, UTF-16, ISO 2022, and EUC are examples of + * character-encoding schemes. Encoding schemes are often associated with + * a particular coded character set; UTF-8, for example, is used only to + * encode Unicode. Some schemes, however, are associated with multiple + * coded character sets; EUC, for example, can be used to encode + * characters in a variety of Asian coded character sets. * *

When a coded character set is used exclusively with a single - * character-encoding scheme then the corresponding charset is usually named - * for the character set; otherwise a charset is usually named for the encoding - * scheme and, possibly, the locale of the character sets that it supports. - * Hence US-ASCII is the name of the charset for US-ASCII while + * character-encoding scheme then the corresponding charset is usually + * named for the coded character set; otherwise a charset is usually named + * for the encoding scheme and, possibly, the locale of the coded + * character sets that it supports. Hence US-ASCII is both the + * name of a coded character set and of the charset that encodes it, while * EUC-JP is the name of the charset that encodes the * JIS X 0201, JIS X 0208, and JIS X 0212 - * character sets. + * coded character sets for the Japanese language. * *

The native character encoding of the Java programming language is - * UTF-16. A charset in the Java platform therefore defines a mapping between - * sequences of sixteen-bit UTF-16 code units and sequences of bytes.

+ * UTF-16. A charset in the Java platform therefore defines a mapping + * between sequences of sixteen-bit UTF-16 code units (that is, sequences + * of chars) and sequences of bytes.

* * * @author Mark Reinhold