How to Write XML

Page Index

Introduction

This page walks you through a simple XML document. Then, for more advanced users, it explains the various components of an XML document and some of the standards to which an XML document must comply.

Writing basic XML

This tutorial will show you how to make a lexicon directly as an XML document from your fieldwork data or other sources. You can write XML in any text editor, such as WordPad, or buy an XML editor, such as XMLSpy. Your piece of data can be any size; XML tags are defined by you, so it is possible to start with "page" if you wanted. However, in order to make your document more portable, it is best to validate it according to a standard XML schema or DTD.

More on validating XML

It is possible to represent the semantic interpretation using XML as in the following examples (first broken down to the word level, then broken down to the morpheme level):

  <sentence>
    <word>
      <unanalyzed form>colorless</unanalyzed form>
      <grammatical-relation relation-term="is a" pos="adj' />
      <gloss lang="English" value="colorless" >
        <meaning>without hue</meaning>
      </gloss>
        <comments>lacking color</comments>
    </word>
    <word>
      <unanalyzed form>green</unanalyzed form>
      <grammatical-relation relation-term="is a" pos="adj" />
      <gloss lang="English" value="green" >
        <meaning>green in color</meaning>
      </gloss>
    </word>
    <word>
      <unanalyzed form>ideas</unanalyzed form>
      <grammatical-relation relation-term="is a" pos="noun" />
      <gloss lang="English" value="ideas" >
        <meaning>plans, schemes or methods</meaning>
      </gloss>
    </word>
     .......
  </sentence>

  <sentence>
    <word>
      <unanalyzed form>ideas</unanalyzed form>
      <grammatical-relation relation-term="is a" pos="adj' />
      <gloss lang="English" value="ideas" >
      <morpheme>
          <unanalyzed form>idea</unanalyzed form>
           <meaning>a plan, scheme or method</meaning>       </morpheme>
      <morpheme>
        <unanalyzed form>s</unanalyzed form>
           <meaning>English plural marker</meaning>
      </morpheme>
      </gloss>
    </word>
  </sentence>

This file is composed of sets of user-defined XML tags that are embedded within each other.

For information on using XML for a variety of purposes, follow these links:

XML components

This section of the page provides a more advanced look at XML.

XML Declaration

The first line in the document - the XML declaration - defines the XML version and the character encoding used. In this case the document conforms to the specification of XML version 1.0 and uses the UTF-8 (Unicode) character set.

<?xml version="1.0" encoding="UTF-8"?>

Root Node

All XML documents must contain one, and only one tag pair to define a root element. All other elements must be within this root element and may or may not have child elements that are properly nested, as seen in the structure below:

  <root>
    <child>
      <subchild>.....</subchild>
    </child>
  </root>


The following example shows a possible lexicon structure:

  <lexicon>
    <form>
      <linguisticform>.....</linguisticform>
      <gloss>.....</gloss>
    </form>
  </lexicon>

The first line of the document used in the example above describes the root element of the document. Because it is the outermost embedded tag, it lets us know what the document is; it tells us, "This document is a lexicon."

XML Namespace

Although not required for well-formed XML, we recommend that you have a namespace. The following line contains a place for a namespace. An XML namespace provides a way to avoid element name (tag) conflicts. For example, if two XML files use a tag called <word>, and the two documents were merged together or integrated in some way, there would be no way to distinguish them. In order to distinguish them we add a namespace. In the file "cat.xml" we want to distinguish our <lexicon> tag from others, so we add a namespace with an associated URL.

Once the namespace is established, as in the line below, only the prefix 'xsi:' is needed for the elements of the document. When a node is associated with a namespace, its child nodes are automatically associated with the same namespace. Furthermore, when a default namespace is set, any node that doesn't specify a namespace is associated with the default namespace.

<lexicon xmlns:xsi="http://emeld.org/my-namespace">

Nesting Tags

All XML elements must be properly nested. If tags are not nested properly, the XML file will not be valid.

Properly nested tags will be understood:

  <person>
    <name>John Clark</name>
  </person>

Improperly nested tags are not valid:

  <person>
    <name>John Clark</person>
  </name>

The next lines in the XML document describe the <forms> element, a child element of the root <lexicon>, and its children. Going from the furthest embedded tag, you can see that the unanalyzedform "niu3" is a linguisticform (perhaps opposed to glossform), the linguisticform is a form, of type "free root" with id number "2860" and of language "Biao Min".

  <form formtype="free root" id="2860" lang="Biao Min">

Attributes

Some of the elements shown below contain element attributes. Attributes are used to provide additional information about elements. Attributes are inside an element tag and the values must always be enclosed in quotes, but either single or double quotes can be used.

Data can either be stored in child elements or in attributes.

Take a look at these examples:

  <form formtype="free root" id="2860" lang="Biao Min">
     <linguisticform>
       <unanalyzedform>niu3</unanalyzedform>
     </linguisticform>


   <form id="2860">
     <formtype>free root</formtype>
     <language>Biao Min</language>
     <linguisticform>
       <unanalyzedform>niu3</unanalyzedform>
     </linguisticform>
   </form>

In the first example, formtype and language are attributes. In the last, formtype and language are child elements. Both examples provide the same information. There are no rules stating what should be an attribute and what should be an element. Whatever best describes the data should be used.


User Contributed Notes
E-MELD School of Best Practice: Writing XML
+ Add a comment
  + View comments

Back to top Credits | Glossary | Help | Navigation | Site Map | Site Search