Unicode is a character set frequently used on the web; it supports hundreds of thousands of characters from many languages—English, Greek, Chinese, Arabic, and all other scripts in current use. A very common encoding scheme for Unicode, called UTF-8, uses a variable number of bits to represent different characters (with more commonly used characters using fewer bits). Valid UTF-8 characters can be of any of the following forms, using 1, 2, 3, or 4 bytes, and have one of the following forms (where x represents an arbitrary bit):
The ith character in the Unicode character set is encoded by the ith legal UTF-8 representation, resulting from converting i into binary and filling in the x (and y) bits from the templates.
How many characters can be encoded using UTF-8?
Already registered? Login
Not Account? Sign up
Enter your email address to reset your password
Back to Login? Click here