The characters with values that are outside of the 16-bit range, and within the range from 0x10000 to 0x10FFFF, are called supplementary characters and are defined as a pair of char values.Ī character is a minimal unit of text that has semantic value.Ī character set is a collection of characters that might be used by multiple languages. To correct the definition, a scheme was developed to handle characters that could not be encoded in 16 bits. The definition of a character in the Java programming language could not be changed from 16 bits to 32 bits without causing millions of Java applications to no longer run properly.
When the specification for the Java language was created, the Unicode standard was accepted and the char primitive was defined as a 16-bit data type, with characters in the hexadecimal range from 0x0000 to 0xFFFF.īecause 16-bit encoding supports 216 (65,536) characters, which is insufficient to define all characters in use throughout the world, the Unicode standard was extended to 0x10FFFF, which supports over one million characters. The Unicode standard was initially designed using 16 bits to encode characters because the primary machines were 16-bit PCs. For example, the value 0x0041 represents the Latin character A.
The Unicode standard uses hexadecimal to express a character. Unicode is a computing industry standard designed to consistently and uniquely encode characters used in written languages throughout the world.