Java Programming/Unicode

Most Java program text consists of characters, but any Unicode character can be used as part of identifier names, in comments, and in character and string literals. For example, π (which is the Greek Lowercase Letter pi) is a valid Java identifier:

and in a string literal:

Unicode escape sequences
Unicode characters can also be expressed through Unicode Escape Sequences. Unicode escape sequences may appear anywhere in a Java source file (including inside identifiers, comments, and string literals).

Unicode escape sequences consist of
 * 1) a backslash ' ' (ASCII character 92, hex 0x5c),
 * 2) a ' ' (ASCII 117, hex 0x75)
 * 3) optionally one or more additional ' ' characters, and
 * 4) four hexadecimal digits (the characters ' ' through ' ' or ' ' through ' ' or ' ' through ' ').

Such sequences represent the UTF-16 encoding of a Unicode character. For example, 'a' is equivalent to '\u0061'. This escape method does not support characters beyond U+FFFF or you have to make use of surrogate pairs.

Any and all characters in a program may be expressed in Unicode escape characters, but such programs are not very readable, except by the Java compiler - in addition, they are not very compact.

One can find a full list of the characters here.

π may also be represented in Java as the Unicode escape sequence. Thus, the following is a valid, but not very readable, declaration and assignment:

The following demonstrates the use of Unicode escape sequences in other Java syntax:

Note that a Unicode escape sequence functions just like any other character in the source code. E.g.,  (double quote, ") needs to be quoted in a string just like ".

International language support
The language distinguishes between bytes and characters. Characters are stored internally using UCS-2, although as of J2SE 5.0, the language also supports using UTF-16 and its surrogates. Java program source may therefore contain any Unicode character.

The following is thus perfectly valid Java code; it contains Chinese characters in the class and variable names as well as in a string literal: