Java Glossary : UTF

CMP home Java glossary home Menu no menu Last updated 2004-06-28 by Roedy Green ©1996-2004 Canadian Mind Products

Java definitions: 0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

You are here : home : Java Glossary : U words : UTF.

UTF
UTF, or more properly UTF-8, is not intended to be human-readable. It is a compact binary-encoded form of Unicode that uses a mixture of 8, 16 and 24-bit codes. Java Strings are stored as a 16-bit big-endian length count followed by a 7-bit ASCII string. Web files don't have the length counts. Not null terminated. "ABC" == 0x0003414243. Non-ASCII-7 chars use multibyte encodings with first byte having the high bit on. UTF is an external format. UTF strings are interconverted to ordinary Strings during I/O by readUTF and writeUTF . Unicode-2 supports even 32 bit characters, and UTF has been extended to handle them as well.
Unicode UTF bytes required to represent the character
00000000 0xxxxxxx 0xxxxxxx 1
00000yyy yyxxxxxx 110yyyyy 10xxxxxx 2
zzzzyyyy yyxxxxxx 1110zzzz 10yyyyyy 10xxxxxx 3
UTF-8 in encoded like this:

// encode unicode-16 into UTF-8
void putwchar( char c)
   {
   if ( c < 0x80 )
      {
      putchar ( c );
      }
   else if ( c < 0x800 )
      {
      putchar ( 0xC0 | c >> 6 );
      putchar ( 0x80 | c & 0x3F );
      }
   else if ( c < 0x10000 )
      {
      putchar ( 0xE0 | c >> 12 );
      putchar ( 0x80 | c >> 6 & 0x3F );
      putchar ( 0x80 | c & 0x3F );
      }
   else if ( c < 0x200000 )
      {
      putchar ( 0xF0 | c >> 18 );
      putchar ( 0x80 | c >> 12 & 0x3F );
      putchar ( 0x80 | c >> 6 & 0x3F );
      putchar ( 0x80 | c & 0x3F );
      }
   }

UTF-7 is encoded like this, I kid you not:


view


CMP logo
CMP_home
home
Canadian Mind Products CSS
HTML Checked!
ICRA ratings logo
mindprod.com IP:[24.87.56.253]
Your IP:[80.134.30.163]
You are visitor number 2094.
Please send errors, omissions and suggestions
to improve this page to Roedy Green.
You can get a fresh copy of this page from: or possibly from your local J: drive mirror:
http://mindprod.com/jgloss/utf.html J:\mindprod\jgloss\utf.html