Unicode

The objective of this book is to maintain a reference to Unicode encoding and anything related to Unicode specification.

This book is necessary because, although the articles here about Unicode reference were removed from Wikipedia and Wikisource, this standard is widely used by IT technologies and a reference is very necessary.

Introduction
Unicode is an industry standard whose goal is to provide the means by which text of all forms and languages can be encoded for use by computers through a single character set. Originally, text-characters were represented in computers using byte-wide data: each printable character (and many non-printing, or "control" characters) were implemented using a single byte each, which allowed for 256 characters total. However, globalization has created a need for computers to be able to accommodate many different alphabets (and other writing systems) from around the world in an interchangeable way.

The old encodings in use included ASCII or EBCDIC, but it was apparent that they were not capable of handling all the different characters and alphabets from around the world. The solution to this problem was to create a set of "wide" 16-bit characters that would theoretically be able to accommodate most international language characters. This new charset was first known as the Universal Character Set (UCS), and later standardized as Unicode. However, after the first versions of the Unicode standard it became clear that 65,535 (216) characters would still not be enough to represent every character from all scripts in existence, so the standard was amended to add sixteen supplementary planes of 65,536 characters each, thus bringing the total number of representable code points to 1,114,112. To this date, less than 10% of that space is in use.

Table of Contents

 * /Character reference/
 * /Encodings/
 * /Implementations/
 * /Versions/
 * /List of useful symbols/

Links

 * Characters Ordered by Unicode

Coder avec Unicode

Unicode 17.0

 * Some additional ideographs (total 8 characters) will be added to Tangut. (U+187F8-U+187FF)
 * Additional ideographs (total 20 characters) will be added to Tangut Supplement. (U+18D09-U+18D1C)
 * Kana Katakana to Hiragana Small Ki Ku Sa Si Su Se and So (total 55 characters) will be added to Small Kana Extension. (U+1B130-U+1B153-U+1B154-1B16D)
 * Stein Zimmerman Symbols, Digit Slash Symbols, and other Symbols (total 23 characters) will be added to Musical Symbols. (U+1D127-U+1D128, U+1D1EB-U+1D1F6, U+1D1F7-U+1D1FF)
 * (total 6 characters) will be added to Symbols for Legacy Computing Supplement. (U+1CCFA-U+1CCFC, U+1CEBA-1CEBF)
 * Historical asteroid symbols (total 4 characters) will be added to Alchemical Symbols. (U+1F777-U+1F77A)
 * Chemical symbols (total 9 characters) will be added to Supplemental Arrows-C. (U+1F8D0-U+1F8D8)
 * White and Black Chess Ferz and Alfil (total 4 characters) will be added to Chess Symbols. (U+1FA54-U+1FA57)
 * An alarm bell symbol (total 1 character) will be added to Symbols for Legacy Computing. (U+1FBFA)