Last updated 2004-06-28 by Roedy
Green ©1996-2004 Canadian Mind Products
Java definitions: 0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
You are here : home : Java Glossary : U words : Unicode.
There are even codes for apple = '\uf000', British pound sign £ = '\u20A4', degree ° = '\u00b0' checkmark = '\u0271', dharma wheel = '\u2638', division = '\u00f7', euro € = '\u20AC', female = '\u2640', heart = '\u2665', infinity = '\u221E', integral = '\u222B', male = '\u2642', pi = '\u03C0', PI = '\u03C0', sun = '\u2600', telephone = '\u269E' and trademark TM = '\u2122'.
There are also arrows: \u2013 \u2017 \u2101 \u2108 \u2190 \u2191 \u2192 \u2193 \u2194 \u2195 \u21A2 \u21AC \u21AD \u21B0 \u21B6 \u21C5 \u21CE \u21DC
In addition there all kinds of interesting special characters characters such as: Alphabetic Presentation Forms, APL, Arrows, Bengali, Block Elements, Box Drawing, Braille Patterns, Byzantine Musical Symbols, Combining Diacritical Marks, Combining Half Marks, Combining Marks for Symbols, Control Pictures -- icons for control chars, Currency Symbols, Dingbats, Enclosed Alphanumerics, General Punctuation, Geometric Shapes, Halfwidth and Fullwidth Forms, High Surrogates, Ideographic Description Characters, IPA Extensions, Letterlike Symbols, Low Surrogates, Mathematical Alphanumeric Symbols (32 bit Unicode), Mathematical Operators, Mathematical Symbols, Miscellaneous Symbols (astrology, chess, playing cards), Miscellaneous Technical (del, grad, integral), Musical Symbols, Number Forms (e.g. Roman numerals), OCR (Optical Character Recognition -- the OCR-A MICR characters used in magnetic ink cheque encoding), Old Italic, Runic, Small Form Variants, Spacing Modifier Letters, Specials, Superscripts and Subscripts, Tags (letters with price tags), Unified Canadian Aboriginal Syllabic and Variation Selectors.
Download Word For Windows document giving the full Unicode character set with Java, HTML and Postscript encodings. It is not as pretty as the PDF, but it is easier to search.
Nic Fulton of Reuters has written an Java Test Applet that can display all 64 thousand Unicode characters including the Chinese/Korean Han. How many of them actually display on your screen depends on the font handling ability of your browser and operating system, and which fonts you have installed. In Java programs, intractable Unicode characters are represented in the form '\uffff', with four hex digits. Ordinary characters like 'A' are actually 16-bit Unicode too.
How do you create and edit the various flavours of Unicode documents? You can create them in some specific encoding then convert them. To write a little utility to do that read up on encoding and ask the File I/O Amanuensis for sample code. You can use lowly Notepad in Windows NT/W2K/XP to edit existing documents but not earlier Windows versions. You would have to acquire an almost empty Unicode document for getting started with new documents. It is even clever enough to deal with byte order (endian) marks. Recent version of MS Word in Windows NT/W2K/XP also work.
| Unicode Endian Markers | |
|---|---|
| Byte-order mark | Description |
| EF BB BF | UTF-8 |
| FF FE | UTF-16 aka UCS-2, little endian |
| FE FF | UTF-16 aka UCS-2, big endian |
| 00 00 FF FE | UTF-32 aka UCS-4, little endian. |
| 00 00 FE FF | UTF-32 aka UCS-4, big-endian. |
home |
Canadian Mind Products | |||
| mindprod.com IP:[24.87.56.253] | ||||
| Your IP:[80.134.30.163] | ||||
| You are visitor number 52866. | ||||
| Please send errors, omissions and suggestions | ||||
| to improve this page to Roedy Green. | ||||
| You can get a fresh copy of this page from: | or possibly from your local J: drive mirror: | |||
| http://mindprod.com/jgloss/unicode.html | J:\mindprod\jgloss\unicode.html | |||