Java Glossary : XML

CMP home Java glossary home Menu no menu Last updated 2004-06-29 by Roedy Green ©1996-2004 Canadian Mind Products

Java definitions: 0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

You are here : home : Java Glossary : X words : XML.

xml logoXML
The primary function of XML is to consume RAM and datacommunication bandwidth. Presumably it was promoted to its current frenzy by companies who sell either RAM or bandwidth. Others promoting it have patents they hope to spring on the public once it is entrenched. XML is the biggest con game going in computers. You probably guessed, I am known for my rabid dislike of XML.

XML is the Extensible Markup Language, a W3C proposed recommendation. Like HTML, XML is based on SGML, an International Standard (ISO 8879) for creating markup languages. However, while HTML is a single SGML document type, with a fixed set of element type names (AKA "tag names"), XML is a simplified profile of SGML: you can use it to define many different document types, each of which uses its own element type names (instead of HTML's "html ", "body", "h1", "ol", etc.). For example, in XML, you can markup an online transaction like this:

<?xml version="1.0" standalone="no"?>
<!DOCTYPE order SYSTEM "order.dtd">
<!-- Wile E. Coytote roller skate order -->
<order>
<invoice-number>12345</invoice-number>
<customer>Wile E. Coyote</customer>
<date>1997-12-14</date>
<item>
<name>Jet-Propelled Roller Skates</name>
<catalog-number>345-678-9</catalog-number>
<quantity>2</quantity>
</item>
<item>
<name>100,000-pound Weight</name>
<catalog-number>987-654-3</catalog-number>
<quantity>1</quantity>
</item>
</order>
Just like HTML, comments begin with <-- and end with -->. You can abbreviate <mytag></mytag> as <mytag />. Just like HTML, various characters are reserved and have long forms called entities to use when they occur accidentally in the text as data: &amp;, &lt;, &gt;, &apos; and &quot; Character references take one of two forms: decimal references, &#8478; and hexadecimal references, &#x211E. Unicode is always presumed.

You describe your little XML subgrammar by writing a DTD (Document Type Definition) file. Optionally, you can include the DTD inline inside your XML file.

There are two popular parsing techniques, SAX (Simple API for XML), which hands you each field as it parses, and W3C DOM (Document Object Model) tree which creates a complete parse tree you can prune and repeatedly scan.

I personally detest XML, however, it has caught on like a cocaine wave. It must have some redeeming features.

XML Benefits

XML Drawbacks

Using XML to transmit data is the analog of insisting that all code be passed around as triple spaced Java source files, with added dummy comments, rather than as binary byte code. There is no guarantee a source file is even syntactically correct. It is impossible to create a syntactically incorrect byte code file. Byte code files can be processed without time-consuming parsing. In byte code, repeating strings are naturally specified only once. XML, as it stands, suffers from all those analogous drawbacks and more.

What Should Replace XML?

The characteristics include:

One possible candidate for the XML replacement job is the Java serialised object format. It can handle just about any data structure imaginable. It is platform independent. It has a simple DTD -- Java source code for the corresponding class. Some claim it is Java-only. Not so. It is no more difficult for C++ to parse than any other similar newly concocted protocol. It is not tied to any hardware or OS. It is just that Java has a head start implementing it. Java can implement it with no extra overhead.

There have been some efforts made to patch up the shortcomings of XML, in fact there are dozens of them. XML is no longer simple any more. It is raggedy patchwork quilt. People were sucked in by the initial simplicity, then discovered that it was not really all that useful in its simple form. Schema was added to allow specifying types (but still only permitting strings). Yes we need a standard interchange format, but XML was only a back of the envelope stab at it. XML was destined to fail since it totally ignored so many factors in coming up with a good design.

One such effort is VTD Virtual Token Descriptor (VTD). A VTD record is a 64-bit integer that encodes the starting offset, length, type and nesting depth of a token in an XML document. Because VTD records don't contain data fields, they work alongside of the original XML document, which is maintained intact in memory by the processing model.

Due to the stupidity, duplicity and/or greed of those promoting XML, we will likely be stuck with some committee-patched variant of it forever -- something that will make even HTML look clean. We need a common data interchange format, but not so inept.

DTD

You need to compose a DTD file that describes the format of the XML file. The <!ELEMENT statement is used to list the various tags you will use, and which tags may be used inside which tags, and how often and in which order. The <!ATTLIST statement is used to list the various attributes (mandatory and optional) of each tag. The <!ENTITY statement lets you make up you own abbreviations.

Here is a simple example:

DTD:

<!ELEMENT square EMPTY>
<!ATTLIST square width CDATA "0">
The CDATA means the value of the field is a string.

XML:

<square width="100"></square>
AELfred ¤ Alphaworks: an XML parser in pure Java called XML4J ¤ binaphobia ¤ Castor ¤ cooktop ¤ Crimson ¤ Developer Life Tutorials ¤ Digitally Signing XML documents ¤ DOM 1 spec ¤ DTD attributes ¤ IBM's tutorial ¤ IBM's XML page ¤ JAXP ¤ JDOM ¤ JNLP (Java Web Start's XML configuration language) ¤ Microstar ¤ RDF ¤ UBDDL (a Yahoo group working to define a more efficient replacement for XML) ¤ UDDI ¤ x->Jen ¤ Xerces-J ¤ XML 1.0 spec ¤ XML Compactor ¤ XML databases ¤ XML inventors ¤ XML validator online ¤ XML Validator tools ¤ XML.ORG ¤ xmlfiles.com (has lots of examples and tutorials) ¤ XMLGlobal has some tutorials and information ¤ XMLsucks.org ¤ XQuery ¤ XSLT ¤ XTP ¤ XUL


CMP logo
CMP_home
home
Canadian Mind Products CSS
HTML Checked!
ICRA ratings logo
mindprod.com IP:[24.87.56.253]
Your IP:[80.134.30.163]
You are visitor number 2604.
Please send errors, omissions and suggestions
to improve this page to Roedy Green.
You can get a fresh copy of this page from: or possibly from your local J: drive mirror:
http://mindprod.com/jgloss/xml.html J:\mindprod\jgloss\xml.html