Java Glossary : serialization

CMP home Java glossary home Menu no menu Last updated 2004-06-30 by Roedy Green ©1996-2004 Canadian Mind Products

Java definitions: 0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

You are here : home : Java Glossary : S words : serialization.

serialization
Serialization is a way of "flattening", "pickling" or "freeze-drying" objects so that they can be stored on disk, and later read back and reconstituted, with all the links between objects intact.

Java has no direct way of writing a complete binary object to a file, or of sending it over a communications channel. It has to be taken apart with application code, and sent as a series of primitives, then reassembled at the other end. Serialized objects contain the data but not the code for the class methods. It gets most complicated when there are references to other objects inside each object. Starting with JDK 1.1 there is a scheme called

that uses ObjectInputStream and ObjectOutputStream. Cynthia Jeness did an excellent presentation at the Java Colorado Software Summit in Keystone on serialisation. Her extensive lecture notes are available online.

Bulk

Serialised objects are very large. They contain the UTF-8-encoded classnames (usually 16-bit length + 8 bits for common chars and 16 bits or more for rarer chars), each field name, each field type. There is also a 64-bit class serial number. For example, a String type is encoded rather verbosely as java.lang.String. Data are in binary, not Unicode or ASCII. There is some cleverness. If a string is referenced several times by an object or by objects it points to, the UTF string literal value appears only once. Similarly the description of the structure of an object appears only once in the ObjectOutputStream, not once per writeObject call.

Serialisation works by depth first recursion. This manages to avoid any forward references in the object stream. Referenced objects are embedded in the middle of the referencing object. There are also backward references encoded as 00 78 xx xx, where xx xx is the relative object number.

While the lack of forward references simplifies decoding, the problem with this scheme is, you can overflow the stack if, for example, you serialized the head of a linked list with 1000 elements. Recursion requires about 50 times as much RAM stack space as the objects you are serialising. Another problem is there are no markers in the stream to warn of user-defined object formats. This means you can't use general purpose tools to examine streams. Tools would have to know the private formats, even to read the standard parts.

If your object A references C, and B also references C, and you write out both A and B, there will be only one copy of C in the object stream, even if C changed between the writeObject calls to write out A and B. You have to use the sledgehammer ObjectOutputStream.reset() which discards all knowledge of the previous stream output (including Class descriptions) to ensure a second copy of C.

Happily, serialization of ArrayLists is clever. They take only a few bytes more than the equivalent array. It does not bother to serialise the empty slots at the end.

Turning On Serialization

To make a class serialisable all you do is say:
implements java.io.Serializable
Note the American spelling of Serialisable substituting a z for the s!

You don't need to write any methods. Serializable is just a dummy marker interface that turns on serializability.

You don't have to write a readObject or writeObject method, but if you do, you still need the implements java.io.Serializable.

Fine Tuning

You can roll your own serialisation by writing readObject and writeObject to format the raw data contents, or by writing readExternal and writeExternal , that take over the versioning and labelling functions as well. You can see an example of readObject and WriteObject in the BigDate class. There is nothing special you do other that implement Serializble to register or writeObject and readObject methods. defaultWriteObject has at its disposal a native introspection tool that lets it see even private methods, and reflect to pick out the fields and references. JavaSoft has written a spec on serialisation that you should probably read if you want to do anything fancier than invoke the default writeObject method.

Don't confuse the custom readObject method of a your class with the ObjectInputputStream.readObject method you use to read a whole tree of objects.

You might wonder how serialisation manages to get at the non-transient private members via reflection. It uses AccessController.doPrivileged() to override the general security privileges.

The Asymmetry of Read and Write

A class can pickle itself, but it can't reconstitute itself. The problem is an asymmetry in readObject and writeObject . writeObject is quite happy to work with this whereas readObject insists on creating a new object. What do you do? Bill Wilkinson, the serialization guru, suggested two tactics:
  1. Your load code can open the ObjectStream and reconstitute a new object, then copy the fields over to this.
  2. Your save code can save the fields of this individually, then your load code can reconstitute them individually.

SerialVersianUID

It is probably best to assign your own version id for each class:
static final long serialVersionUID = 3L;
This must change if any characteristics of the pickled object change. If you don't handle it manually, Java will assign one based on hashing the code in the class. It will thus change every time you make a very minor code change that may not actually affect the pickled objects. This will make it more difficult to restore old object streams.

Transient

You can reduce the size of your serialized object by marking some fields transient. The values of these fields won't be written. When the object is read back, it is up to you to reconstitute the fields. You can put your reconstituting code in a custom validateObject or in a custom readObject after a in.defaultReadObject() call. Note that you must manually reconstitute all the transient fields. None of the the initialisation or constructor code will be run for you. Unless you specify implements ObjectInputValidation, your validateObject method will be ignored.

If you have a reference to a non-serializable object, you have no choice but to make it transient. You will have to figure out some way to reconstitute the reference in a custom readObject method.

Interning

Interned Strings reconstitute as ordinary Strings. It is up to you to write a custom readObject method to reintern them.
/**
 * deserialise and ensure fields are re-interned when read back
 *
 * @param stream ObjectInputStream of objects
 *
 * @exception IOException
 * @exception ClassNotFoundException
 */
private void readObject ( ObjectInputStream stream )
throws IOException, ClassNotFoundException
{
   stream.defaultReadObject();
   // reintern all Strings in the object
   name = name.intern();
   extension = extension.intern();
}

NotSerializableException

IF you get a NotSerializableException, you forgot to put
implements java.io.Serializable
on the class you are doing a writeObject on. Since writeObject also writes out all the objects pointed to by that object, by the objects those objects point to, ad infintum, all those classes too must be marked implements Serializable. Any references to non-serializable classes must be marked transient. While you are at it, give each of those classes an explicit version number.

Serialization Lore

The now defunct Lotus Ensuite made great use of serialization. They would freeze dry the entire running state of an application, run another app, then reconstitute the previous one, bringing it back exactly where you left off.

You can't serialise Images and send them via RMI to another platform, because Images are platform specific. You need to convert your Image to a platform independent format. You can use the JAI API or you can write a class with ints only and use a PixelGrabber to create an int array representation of that Image (you also need the height and width). Then you can send the int[] representation of the class over the ObjectStream and cast it back at the destination. Then use createImage from the java.awt.Toolkit on a MemoryImageSource to recreate the Image data type.

Bill Wilkinson's Take

Bill Wilkinson has been writing in the newsgroups for years explaining the pitfalls of Java serialisation. I have been bugging him to collect these posts into a coherent essay. He said, perhaps for Groundhog day 2000. I am going to make a first cut at that essay for him, hoping it will prod him to finish it properly. This first cut is taken from one of his posts.

Serialisation, or serialization in American, is Java's way of providing persistent objects, or transmitting objects over a wire (in conjuction with RMI). People like to concoct flavourful terminology to describe the saving (pickling, free drying, swizzling) and restoring (depicking, deswizzling, reconstituting) processes.

In theory all you have to do is save an object and all its dependent objects will automatically go with it. However there are many pitfalls. The Java Gotchas. See also the serialization entry in the Java & Internet Glossary.

There is now an essay on the Sun Site about serialisation.

How Serialisation Works

The rules of object streams say that the first time a given object is encountered, its actual contents are written out. All subsequent references to that same object cause only a "handle" (actually, simply a monotonically increasing counter) to be written out. [ This is the source of the frequent complaint here that modifying an object and then rewriting it doesn't cause a change in the object on the "other end" of the stream.]

When you read in a stream, then, Serialization has to keep a map of all read-in objects, relating them to the "handle" numbers, so that when a given handle number is later encountered a reference to the proper object can be substituted, thus creating a valid newly reconstituted object.

Serialization has no way of knowing that object number 13 in your stream is never referenced again anyplace in the stream, so of course it has to keep everything in that map (which is ever-increasing in size!) forever!

Unless...

Unless you call the "reset" method on the stream. In which case everything starts all over again. (Object numbers restart from zero, etc., etc.)

"Wow!" you say, "what a simple solution." Yes, but...

Once you do a "reset", none of the objects previously written will be "known" to the stream, so once again the first reference to a given object will cause its data to be written to the stream. "Well, what's wrong with that?"

Answer: When you then read that stream, and the "reset" is seen (a special code in the stream), then all knowledge of already-read objects is lost and... yep, you guessed it: You'll read the same object again!!! If you aren't prepared for this and you don't program accordingly, the results can be disastrous.

There is another negative consequence of doing "reset". The first time any class is written (or the first time after a "reset"), an incredible amount of junk that describes that class is written to the stream.

If you will only be serializing a handful of classes, and if you only need to do a "reset" every few hundred kilobytes, then this overhead isn't too onerous. But if you need to do a reset after every small group of objects, and if nearly every object in the group is a different type, then this overhead will bite you. (Note that even predefined system types, such as java.lang.Integer, must be "fully described" in the stream.)

So what's the solution, if "reset" isn't appropriate to your needs? Dump Serialization. It's slow and clumsy and has a lot of overhead. But that may not be viable if you really do depend on its ability to maintain object references in large networks of objects (sometimes called "pickling" or "swizzling" and "depickling" or "deswizzling"). On the other hand, if you are simply sending pure numeric and textual data back and forth--if connections between objects are uninteresting to you--then do consider "rolling your own" instead of using Serialization.

The Format Of A Pickle File

This changes between Java versions, constantly improving.

The Recursion Gotcha

Very briefly, the serial writer uses recursion in early verions of Java and hence can easily overflow the stack, when for example serialising a LinkedList.

The Symmetry Gotcha

There is a fundamental assymetry in the way you read and write objects:
// The fundamental asymmetry

// write
stream.writeObject( obj ); // ok

// read
obj = stream.readObject(); // ok

// write
stream.writeObject( this ); // ok

// read
this = stream.readObject(); // illegal
You can write out the current object, but you can't read it back. All you can do is read back creating some other object, then copy the fields into this object.

The Uninitialised Transient Fields Gotcha

Transient fields are ones the serial writer does not bother to save. It saves disk space to reconstruct them later when the objects are reconstituted. Since the serial loader does not invoke your constructor, transient fields will not be initialised. They will be merely zeroed.

Example of Use

The File I/O Amanunesis will generate you sample code with thousands of variations. Just tell it your data format is serialised objects. By playing with the controls you can get it to generate sample code for almost any circumstance.

Versioning

Here is a common problem:
  1. You have serialized objects written on filesystem or in database.
  2. You modify the class that is serialized.
  3. You want to copy the needed data from old class to the new one.
If the objects have gone through a major reorg, use two different classLoaders, copy fields and do whatever else is necessary to upgrade your objects.

If the objects are actually identical, e.g. it is just you added another method to the class, you can manually give both classes a version id. of the form:

static final long serialVersionUID = 3L;
If you don't provide such an ID, one is automatically generated for you by hashing together bits of the class source code.

If the objects are just a little bit different, e.g. a new field. You can use the manual version number method. I don't recall the precise details, but under some circumstances, the serial loader won't mind minor differences. It just zeros out new fields, and drops unused ones. Keep in mind the serial loader does not use your constructor! You can't count on it to do any initialisation of transient fields, especially the new ones.

Serializable vs Exteralizable

Overriding readObject

Overriding readExteral

Unserializable Objects

The literal reason a Object can't be serialized is because it class does not have implements java.io.Serializable on the class declariation. Why would an author make his class unserializabe?
  1. He did not need it, so it just never occurred to him.
  2. Laziness. He did not want to deal with transient fields and writing code to deal with reconstituting them.
  3. The class changes so frequently you would never be able to read the old files.

Reconstitution Magic

The process of serialisation does something magic -- it manages to fetch values in private fields in any class on write and store values into private fields of any class on read. It uses a trick called AccessibleObject.setAccessible( boolean accessible ) to sneak around the usual restriction.

How God Would Have Implemented Pickling

To come some day.


CMP logo
CMP_home
home
Canadian Mind Products CSS
HTML Checked!
ICRA ratings logo
mindprod.com IP:[24.87.56.253]
Your IP:[80.134.30.163]
You are visitor number 5938.
Please send errors, omissions and suggestions
to improve this page to Roedy Green.
You can get a fresh copy of this page from: or possibly from your local J: drive mirror:
http://mindprod.com/jgloss/serialization.html J:\mindprod\jgloss\serialization.html