Java Glossary : sort

CMP home Java glossary home Menu no menu Last updated 2004-06-28 by Roedy Green ©1996-2004 Canadian Mind Products

Java definitions: 0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

You are here : home : Java Glossary : S words : sort.

sort
Putting items in ascending or descending order by key fields. The English language meaning of sort is simply to categorise, e.g. to put all your bills into folders by category, e.g. all the electric bills together, all the phone bills together. An internal sort is done strictly in RAM. An external sort uses hard disk temporary files. A sort is stable if it naturally preserves existing order when you have duplicate keys. If you want existing order preserved and you are using an unstable sort, you must add an extra sequence number key to your records and use that as a secondary tie-breaking key.

Here are some sorts: NIST, CSUSB.

In the JDK 1.2 there are built in sorts such as Arrays.sort and Collections.sort . See java.util.Comparator and java.util.Comparable for different sorting orders. For String sorting in various international orders see java.text.Collator, java.text.RuleBasedCollator, and java.text.CollationKey.

In the simplest case, you can sort an array like this:

import java.util.Arrays;
Arrays.sort( myArray );

or an ArrayList (or any other Collection) like this:

import java.util.ArrayList;
import java.util.Collections;
Collections.sort( myArrayList );

Note that there is no such method as ArrayList.sort.

For more complex sorts you need a class that implements Comparator. Often you can piggyback that code on some other class. The following example does the same as the code above, but could be extended to sort by several fields.

class StringComparator implements Comparator
   {
   // Compare two Strings. Callback for sort.
   // effectively returns a-b;
   // e.g. +1 (or any +ve number) if a > b
   // 0 if a == b
   // -1 (or any -ve number) if a < b
   public final int compare ( Object a, Object b)
      {
      return((String )a).compareTo( (String)b);
      } // end compare
   } // end class StringComparator

...
import java.util.Arrays;
...
Arrays.sort(myArrayOfStringObjects, new StringComparator() );

To sort in descending order, simply write a Comparator with a compareTo that returns the negative of the value it does now i.e.

return -( (String )a ).compareTo( (String)b );

If you write a class, you can define its natural sort order, by having that class implement the Comparable interface. Then you can sort arrays or ArrayLists of your Objects without needing to specify a Comparator .

In the following table, speed is measured relative to RadixSort on a 100,000 item set of 5-character random keys of the form A9999 where SunSort is arbitrarily assigned 100. Tests done under Java 1.4.1 java.exe on a 233 MHz Pentium 128 MB RAM. Bigger is better.

Sort speed
(bigger is better)
stable? notes
SunSort (Java's Built-In Sort) 100 yes This is a most excellent sort. It uses a modified quicksort for primitives and a mergesort (sometimes called a tournament sort for objects). The merge sort makes a clone of the array (but not the objects) before it starts, so be aware of the memory usage. There is no reason to use any but this sort except for very large sorts with keys of 4 characters or less where RadixSort might beat it out. Of course you can't use this in the older Java versions before it was available. Invoke with Arrays.sort and Collections.sort. Beware when using the versions with startIndex and endIndex. The endIndex points one past the last element.
RadixSort 86 yes Good for very large sorts, takes same time per item no matter how many items. Ordinary sorts take more time per item as the number of items goes up. More complicated Comparator routine to write. Likes short keys. If you double the length of the key the sort time doubles.
HeapSort 48 no Works best when data are almost already sorted.
ShellSort 34 no Small and quick-loading. Good for small sorts under 2000 items.
QuickSort 14 no Likes random data to sort. Order in the data can cause it to take a very long time to sort. Not good on very small sorts. Varies wildly in time to sort, sometimes a speed demon, sometimes a pig. It just depends on the luck of the draw.
If you have too much data to sort to fit in RAM at once, you will need to use an external sort such as Opt-Tech. It works by sorting RAM-fulls of records, then doing an N-way merge on the sorted batches, to create fewer larger sorted batches. Eventually it ends up with one large sorted batch. It can sort tags records even more quickly. Tag records are short records consisting of the sorting key fields plus a reference back to the original record. Abundance uses Op-Tech sort. It rolls itself out to disk, calls in Opt-Tech sort, giving it all of RAM, then rolls itself back in.

There are a few simple sorts you should almost never use since the take time o(n2). In order words they turn into utter pigs when the number of elements to sort gets large.

One is a selection sort:

  1. Select the largest of the N keys and move its record to the output area.
  2. Select the largest of the remaining N-1 keys and move its record to the output area. etc.
Another is the bubble sort, where you pairwise swap elements in a pass through all the elements to percolate the biggest element to the top. Then repeat for the list not containing the biggest elements so far. etc.

A variant of the bubble sort, that is a tiny bit faster is the bi-directional sort that percolates the smallest element to the bottom and the biggest element to the top in alternating direction passes, gradually working toward the middle. Sun has built an Applet to let you watch bubble sort and bidirectional bubblesort race against quicksort. It becomes obvious why bubblessort should not be used.

Another is the insertion sort, where you create a new list and gradually add to it, always keeping the new list in sorted order by inserting the new element in the place where it belongs. You find the place by binary search, and slide all the subsequent elements down to make room for it. You sometimes have to use this type of sort when you don't have all the elements at the start of the sort. They are just given to you one at a time, and you want a sorted list immediately output after each element is added.

Instead of sorting, you can maintain a set in order with TreeSet or its brother TreeMap. Whether you go the TreeSet route or the export with toArray and sort route depends on whether you need sorted order only on occasion. If you do, the export method is better. If you need the sorted order maintained up to the second, then the TreeSet method is better. The TreeSet method consumes more memory and more CPU time, and is perhaps a trifle more complex to code. With either technique, it is possible view the same data in several sorted orders by sorting the exported array in different orders or by maintaining multiple TreeSets on the same data.

One problem you often run into is having two arrays you want to keep in sync when you sort one. There are two approaches.

  1. Combine the two arrays into one, with objects having two or more fields. Write a Comparator or implement Comparable for these new combined objects.
  2. Use the cern.colt.GenericSorting class.


CMP logo
CMP_home
home
Canadian Mind Products CSS
HTML Checked!
ICRA ratings logo
mindprod.com IP:[24.87.56.253]
Your IP:[80.134.30.163]
You are visitor number 13980.
Please send errors, omissions and suggestions
to improve this page to Roedy Green.
You can get a fresh copy of this page from: or possibly from your local J: drive mirror:
http://mindprod.com/jgloss/sort.html J:\mindprod\jgloss\sort.html