Java Glossary : classifying characters

CMP home Java glossary home Menu no menu Last updated 2004-06-28 by Roedy Green ©1996-2004 Canadian Mind Products

Java definitions: 0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

You are here : home : Java Glossary : C words : classifying characters.

classifying characters
There are some methods in Character for classifying characters: getType, isWhiteSpace, isIdentifierIgnorable, isLetter, isDigit, isUpperCase, isLowercase etc.

These methods are quite complex internally since they deal with the full Unicode character set. If you are dealing only with ASCII characters you can use simpler logic such as:

if ( '0' <= c && c <= '9' )...
if ( 'a' <= c && c <= 'z' )...
if ( 'A' <= c && c <= 'Z' )...

An easy way to detect a vowel would be:

if ( "aeiou".indexof ( c ) >= 0 )

The switch statement with cases for each character let the compiler figure out how to efficiently categorise, but the categories must be fixed at compile time.

The traditional classifying method of using a translate table of byte classifications indexed by character consumes 64K per table. It is fast, but gobbles RAM. You could use a BitSet to shrink that to 8K. Consider using a HashMap indexed by Character to look up a sparse set of characters. Another technique is to use a binary search table of special characters. You might look inside the Sun character-classifying methods in Character and Collate to learn a few clever tricks.


CMP logo
CMP_home
home
Canadian Mind Products CSS
HTML Checked!
ICRA ratings logo
mindprod.com IP:[24.87.56.253]
Your IP:[80.134.30.163]
You are visitor number 718.
Please send errors, omissions and suggestions
to improve this page to Roedy Green.
You can get a fresh copy of this page from: or possibly from your local J: drive mirror:
http://mindprod.com/jgloss/classifying.html J:\mindprod\jgloss\classifying.html