Skip to main content
table of contents
XML: eXtensible Markup Language
Developed after HTML (in the late 1990's), partly in response to HTML's quirkiness
Goals:
- Universal format for exchanging both documents and structured data (e.g., records in a database or computer data structures).
- Many languages in one: each XML document can define its own set of tags.
Sample XML
xml
<?xml version="1.0" encoding="utf-8"?>
<person>
<name>Ironman</name>
<age>49</age>
<state>New York</state>
<occupation>Engineer</occupation>
</person>
Format:
- Information in
<...>
is markup, all other information is raw text. - Header line: identifies this as an XML document, indicates character encoding and XML version.
- Body of document consists of a hierarchical collection of elements, starting with a single outer element (
<person>
). - XML does not require any particular element structure or tag names; each document can have its own schema.
- Every element must have an explicit start and end (but can use
<foo />
as shorthand for<foo></foo>
). - Attributes can be contained per tag, such as
<person name="Ironman" age="49" state="New York" height="49">
- Tags can be repeated and nested
& Cannot use
<
or>
directly in a document; these are reserved for tags.- Use
entities
instead: < and > - This means & is a special character also: use &
- Also, need " to include a quote in an attribute.
- Many other entities are defined by XML for convenience.
- Use
- XML has a few other features we won't cover here, such as namespaces for organizing tag names.
- Two optional mechanisms available to enforce a particular structure on an XML document:
- DTD (Document Type Definition): an XML document that describes permissible structure for a class of XML documents ("... each
<person>
element must contain<name>
and<age>
children ..."). - XML Schema: newer than DTD's, designed to get around some shortcomings of DTD's; also more complex.
- There exist programs that will read a DTD or XML Schema file and validate an XML file against it.
- DTD (Document Type Definition): an XML document that describes permissible structure for a class of XML documents ("... each
Benefits:
- Textual format, supports Unicode and UTF-8 for internationalization.
- Simple, clean syntax.
- Can be used for a variety of different purposes.
- Heavily used for data exchange in Web-based applications, and for storing application data.
- The common syntax has allowed a large collection of tools to be developed, most of which will work on any XML document (e.g., validators).
- XML documents can be read by humans when necessary; the tags make them almost self-documenting.
Weaknesses:
- Verbose.
- Not as fast to generate or parse as other formats (but fairly efficient parsers have become available**.
- At the beginning people hoped XML would instantly allow any application that understands XML to communicate with any other application that understands XML, but this hasn't come to pass: if 2 applications use different XML schemas then they can't interact in a meaningful way.
Annotate
Web Technology