DTD Reference
Sample XML
Special Characters
Step-by-Step
BPO DTD

The Big Picture

This documentation explains how to convert your manuscript into XML. XML is a standard format for storing and sharing information for many scientific publishers. If you are familiar with HTML, you'll find that XML is similar. If not, you may want to ask someone who does for advice along the way.

XML files can be converted into different formats, be that HTML, PDF, or another type of XML. The XML format makes it possible for large collections of similar information to be managed effectively.  Xml.com provides more information.

Anatomy of an XML File

To get a feel for what you'll create, study the sample XML file and compare its HTML version. XML files are text files. You can edit them with any word processor.  In XML, unlike in HTML, tags (properly called elements) identify information by its content, rather than on how it should be displayed to the reader. Element names are arbitrarily defined in a "document type definition" or DTD. The DTD can be located at the beginning of an XML file, but it is more commonly a separate file or several files. When a document conforms to a DTD it is called a valid document, and the process of comparing a document to see if it conforms to a DTD is called validating the document. You don't need to read the BPO DTD. The documentation below explains it.

Here is a grocery list:

<grocery_list>
    <item food_group="dairy" quantity="1">Milk Carton</item>
    <item food_group="cereals" quantity="1">Loaf of Bread</item>
    <budget limit="$75"/>
</grocery_list>

The element grocery_list has two child elements called item and one child element called budget. These child elements can occur inside the parent element only if the parent-child relationship has been defined in the DTD. The DTD can also specify the order and number elements that are permitted as children. For example, one or more item elements must occur before exactly one budget element.  In this way a complex XML document tree can be created. The tree is analogous to a hierarchical family tree with its roots, branches and leaves. It identifies the relationships between  parents and their children and siblings (item and budget are siblings because they occur at the same level).

Elements must be paired, or they must be explicitly identified as unpaired. Such rigor does not exist in HTML. For paired elements, an opening tag <item> must have a closing tag, identified by adding a backslash, namely </item>. The element's content goes between the opening and closing tags. An unpaired element must be identified by placing a backslash before the closing bracket, such as <budget/> above. Elements must be closed opposite the order in which they are opened. For example the elements <b>,<i> and <u> are used for bold, italic and underline. One may write:

<b>Bold <i>and italic <u>and underlined text.</u></i></b>
but not
<b>Bold <i>and italic <u>and underlined text.</b></u></i>

The first case is well-formed XML; the latter is not and it will generate an error when read by a computer (parsed).

The item element has two attributes. One is food_group and the other is quantity. Attributes provide information about the element concerned. Attributes are defined inside of the opening element tag and have the syntax of the name of the attribute, an equal sign and the attribute's value in quotes.  Again, the names of permitted attributes are defined in the DTD. When specifying attributes, the order in which they are specified is not important. Now you understand elements and attributes.

XML is case sensitive, that is, <item> is not the same as <ITEM> or <ItEm>.  In the BPO DTD, all entities are in lower case.

Whitespace in XML documents is reduced to a single space. That means enter and tab can be used to make the XML file easy to read.  Additional whitespace will be collapsed (normalized) to a single space when the XML is parsed.

Special characters (those that cannot be typed directly on an ASCII keyboard) must be entered as "entities." See the list of available special characters.

Important: Certain characters must be typed as entities (in long form) in your XML document. If typed directly, they will cause errors. To include &, <, >, ' and " type  &amp; , &lt; , &gt; , &apos; and &quot;

The BPO DTD is © 2003 by Biological Procedures Online. It may be freely used, copied and modified as described in the DTD's copyright notice. No warranty is provided.


Step-by-Step

Creating your first XML document may seem like a daunting challenge. This process is somewhat complicated, yet it should not become a source of frustration. If you have any questions or wish clarification, please contact us. Although the authoring of documents in XML may one day become common knowledge, we are not there yet. If something is not working as you feel it should or if a concept is not clear, remember that we are always prepared to help.

Here's how to create your XML document.

  1. Get and extract xmlpack.zip. The directory "xml" contains the subdirectory "article" under which you will create your XML file. RXP is a validating parser that will check your XML document for errors. Mac users, see www.xmacl.com. You need a validating parser to check your final XML file. Unzip for Mac is available at www.info-zip.org/pub/infozip/ .
  2. Open the file "article\article.txt" in your favorite word processor. Use the DTD Reference  below as a guide to complete the <header> section.
  3. Complete the <body> of the XML file using the DTD reference as a guide. Save it in .txt format. Hints to save time:
    • Save your word processor manuscript in HTML format. That will do most of the conversion automatically since HTML and the BPO DTD are similar. Then rename the HTML file to end in ".txt" and open it with your word processor. Now you can copy and paste between your HTML version and the "article\article.txt" XML file.
    • Make heavy use of the search and replace feature. What to search and replace will become evident as you work. Ie: find special characters and replace them with their character entity.  Look for HTML tags like <P ALIGN="LEFT"> and replace with <p>. Replace <TD BGCOLOR="#FFFFFF" VALIGN="TOP"> with <td><p> etc.
    • For references, have your reference manager produce text output in XML format to save from copying and pasting XML tags around reference information.
    • When in doubt review the sample file. It has tables, lists and figures, sections and references to serve as examples.
  4. Once your XML file is complete, validate it using the "rxp.exe" program. Open a DOS command line, change directory to where your article is (ie: "cd c:\windows\desktop\xml\article") and run "..\rxp\rxp -Vvsx article.txt | more". This will generate a list of errors to fix. Repeat until no errors remain. A more comprehensive alternative to RXP is XMLVALID, available free at www.elcel.com/products/xmlvalid.html. (If your "article.txt" file is in a different folder, you will need to change its <!DOCTYPE line to specify the location of bpo.dtd.) Hint: press F3 to recall the last DOS command.
  5. Zip all files in your "article" directory. Open a DOS window, change directory as above and run "..\zip\zip article.zip *.*". (A version of Zip for Macintosh is available at www.info-zip.org/pub/infozip/.)
  6. Congratulations! Submit the file "article.zip".

BPO DTD Reference

Each element is described in the following format:

Element: name - description
    Contains: (list of child elements)

    Attribute:    name - description

        Values:    (list of allowed attribute values) - description

The list of child elements specifies the order in which child elements must occur. Occurrence indicators for child elements are:
none       - required once

?            - may occur zero or one times

*            - may occur zero or more times

+            - must occur one or more times

(a|b)       - a or b once only

(a|b)*     - a or b zero or more times in any order

#PCDATA    - text data, called parsed character data in XML

If this information is confusing, please read the brief XML tutorial at the start of this document.

Attributes in green are required. Attributes specified as being of the type ID must be unique in the document. They name a special location such as a reference, table or figure that will be referred to elsewhere by an attribute of type IDREF that has the same value as its corresponding ID. Both ID's and IDREF's must start with predefined letters (a, r, t, f) for address, reference, table, and figure respectively. The suffix of the ID value is up to you. Sequential numbering i.e. "a1", "t1", "r1", "r2", "r3" etc. is recommended.

List of Elements

Element: document - root element
    Contains: (header, body)

    Attribute: art_type - type of article

        Values:     commentary

                        letter

                        reply

                        editorial

                        correction

                        method
                        review

    Attribute: art_id - unique identifier for the article (use the predefined "myart")

        Values:    ID

This header and body elements are already specified in "article.txt". Now move your cursor between the opening and closing header tags. Here you will specify front matter of your article.

Element: header - header information
    Contains:    (title, first_author, secondary_author*, auth_fn?, address+, corresponding_address?, date_submitted?, date_revised?, date_accepted?, date_published?, mesh_term*, abbreviation*, volume?, issue?, start_page?, end_page?, pids?, copyright)

This means that you must include these elements, in order: one title, one first_author, zero or more secondary_author, one or more address, and so on, all inside the header element. Each element is explained below.

Element: title - title of the manuscript, article
    Contains: (#PCDATA | b | i | u | sup | sub | inline)*

This means the title may contain one or more of the listed elements in any order. These elements are described below, but briefly, they are text data, bold, italic, underline, superscript, subscript, line break and an inline graphic image. You could type, for example <title>The title of my <u>underlined</u> manuscript</title> as the first child of the <header> element.

Element: suffix - suffix of the corresponding author. Include any degrees or positions. Ie: <suffix>M.D., Ph.D., Professor,</suffix>.
    Contains:  (#PCDATA | b | i | u | sup | sub)*

Element: first_author - information about the first author
    Contains: (last, first, lead_initials, middle_initials?, asc_last?, asc_lead_initials?)

    Attribute: address_idref - this attribute is of type IDREF. It must equal the value of the attribute of type ID that is associated with this author's address  (to be entered later). For example address_idref="a1". Choose any IDREF you want but it must start with the letter "a". This attribute links author names and addresses.

       Values: IDREF

    Attribute: corresponding - Select one corresponding author. Set this attribute to "y" if this is the corresponding author.

        Values: y | n
    Attribute: afn_idref - IDREF type attribute for an author footnote. Must start with "afn" then a number, id "afn1".
       Values: afn#

Element: last - surname
    Contains: (#PCDATA | b | i | u | sup | sub)*

Element first - first name
    Contains: (#PCDATA | b | i | u | sup | sub)*

Element: lead_initials - all initials that precede the surname. For the name "John A.H. Smith" this is "JAH". No periods in between letters.
    Contains: (#PCDATA | b | i | u | sup | sub)*

Element: middle_initials - all middle initials. For John A.H. Smith this is "A.H.". Include punctuation.
    Contains: (#PCDATA | b | i | u | sup | sub)*

Element: asc_last - US-ASCII version of last. Use this if last contains any characters outside of [a-zA-Z], namely non-english or any accented characters. This field is used for backwards compatibility with internet services that do not support UTF-8 encoding.
    Contains: (#PCDATA)

Element: asc_lead_initials - US-ASCII version of lead_initials. Use this if lead_initials contains any characters outside of [a-zA-Z], namely non-english or any accented characters. This field is used for backwards compatibility with internet services that do not support UTF-8 encoding.
    Contains: (#PCDATA)

 

Element: secondary_author - other authors - list these in the order they should appear in the article.
    Contains: (last, first, lead_initials, middle_initials?, asc_last?, asc_lead_initials?)

    Attribute: address_idref - this attribute is of type IDREF. It must equal the value of the attribute of type ID that is associated with this author's address  (to be entered later). For example address_idref="a1". Choose any IDREF you want but it must start with the letter "a". This attribute links author names and addresses.

       Values: IDREF

    Attribute: corresponding - Select one corresponding author. Set this attribute to "y" if this is the corresponding author.

        Values: y | n
    Attribute: afn_idref - IDREF type attribute for an author footnote. Must start with "afn" then a number, id "afn1".
       Values: afn#

Element: auth_fn - author footnote
    Contains:
(#PCDATA | b | i | u | sup | sub)*
    Attribute: id - this is the ID attribute for the author footnote element's IDREF. It must start with an afn ie: afn2.
       Values: IDREF

Element: address - a contact address to be displayed at the top of the article
    Contains: (institution, street_address?, country?, fax?, phone?, email?)

   
Attribute: id - this attribute of type ID must start with the letter "a" and it must be unique (among all ID attribute types) in the document. It links this address to one or more authors by being set to the same value as the IDREF attribute in author's name. Ie: id="a1".
        Values: ID

Element: corresponding_address - address for correspondence. Optional. Use if the corresponding address is different than the corresponding author's affiliation address given using the <address> element.
    Contains: (last, first, middle_initials?, suffix?, institution, street_address?, country?, fax?, phone?, email?)

Element: institution - name of an institution
    Contains: (#PCDATA | b | i | u | sup | sub)*

Element: street_address - a mailing address that includes everything but the country. Omit ending punctuation.
    Contains: (#PCDATA | b | i | u | sup | sub)*

Element: country - country name. Omit ending punctuation.
    Contains: (#PCDATA | b | i | u | sup | sub)*

Element: fax - a fax number
    Contains: (#PCDATA)

Element: phone - a phone number
    Contains: (#PCDATA)

Element: email - an email address
    Contains: (#PCDATA)

Now proceed to enter dates. As some of these are still unknown, enter any date. They will be updated by the journal. This is already done in "articles.txt".

Element: date_submitted
    Contains: EMPTY

    Attribute: month

        Values: two digits representing the month. Ie: for March, set this attribute equal to "03".

    Attribute: day

        Values: two digits representing the day. Ie: for the 4th day, set this attribute equal of "04".

    Attribute: year

        Values: four digit year. Ie: "2003".

Note: This element has no content, only attributes and is therefore EMPTY or unpaired. To specify May 17, 2002 write <date_submitted month="05" day="17" year="2002"/>. Include the closing / to indicate that this element is unpaired.

Element: date_revised - date on which the author completed requested revisions
    Contains: EMPTY

    Attribute: month

        Values: two digits representing the month. Ie: for March, set this attribute equal to "03".

    Attribute: day

        Values: two digits representing the day. Ie: for the 4th day, set this attribute equal of "04".

    Attribute: year

        Values: four digit year. Ie: "2003".

Element: date_accepted
    Contains: EMPTY

    Attribute: month

        Values: two digits representing the month. Ie: for March, set this attribute equal to "03".

    Attribute: day

        Values: two digits representing the day. Ie: for the 4th day, set this attribute equal of "04".

    Attribute: year

        Values: four digit year. Ie: "2003".

Element: date_published
    Contains: EMPTY

    Attribute: month

        Values: two digits representing the month. Ie: for March, set this attribute equal to "03".

    Attribute: day

        Values: two digits representing the day. Ie: for the 4th day, set this attribute equal of "04".

    Attribute: year

        Values: four digit year. Ie: "2003".

Element: mesh_term - medical subject heading for your document. Use NIH's PubMed MeSH browser to choose these. They are not arbitrary. See the instructions for authors.
    Contains: (#PCDATA)

Element: abbreviation
    Contains: (short, long)

Element: short - short form of the abbreviation
    Contains: (#PCDATA | b | i | u | sup | sub)*

Element: long - long form of the abbreviation
    Contains: (#PCDATA | b | i | u | sup | sub)*

Element: volume - the volume designation for this article. Do not include this element. It will be included by us upon publication.
    Contains: (#PCDATA)

Element: issue - the issue designation for this article. Do not include this element. It will be included by us upon publication.
    Contains: (#PCDATA)

Element: start_page - the page number of the article's first page. Do not include this element. It will be included by us upon publication.
    Contains: (#PCDATA)

Element: end_page - the page number of the article's last page. Do not include this element. It will be included by us upon publication.
    Contains: (#PCDATA)

Element: pids - publisher or reference identifying data. You generally do not need this element. It will be included by us upon publication.
    Contains: (pid+)

Element pid - document identifyer
    Contains: (#PCDATA)

    Attribute: type

        Values: (pubmed | medline | doi | pmcid)

Element: copyright - the full copyright statement of the article
    Contains: (#PCDATA)


Now move your cursor between the <body> and </body> tags and continue.

Element: body - main body of the article
    Contains: (abstract?, section*, acknowledgments?, references?, protocols?)

Element: abstract - abstract section
    Contains: p+

Element: p - paragraph data
    Contains: (#PCDATA | b | i | u | sup | sub | br | hr | a | xref | inline | list)*

    Attribute: align - paragraph alignment
        Values: (left | right | center) - default is left

Element: section - user-created section, such as "Introduction", "Discussion" etc.
    Contains: (title, (section | p | figure | table)*)

Note: This means you must start the section with a title element and then any number of paragraphs, sections, figures or tables follow. To  create headings and sub-headings, nest sections. I.e. begin a new section inside of an existing section.

Element: acknowledgments - Article acknowledgments
    Contains: p+

Element: references - reference section
    Contains: (ref)*

Element: ref - a reference
    Contains: ( name+, ( (citation, isbn?, url?, pids?) | ( title, journal, volume?, issue?, start_page, end_page?, year, url?, pids?)))
    Attribute: id

        Values: ID - the identifier of the reference. Must start with "r"

    Attribute: type - type of material being referenced

        Values: (jart | other) - journal  article or other

Note: Titles of references must end with a period (".").

There are two types of references. Articles in scholarly journals must use the ref attribute type="jart" and then include the elements name+, ( title, journal, volume?, issue?, start_page, end_page?, year, url?, pids?). "Other" references must include name+, (citation, isbn?, url?, pids?). Generally, use only the citation element and omit the isbn,  url and pids elements.

If the reference is part of an accelerated online publication in a scholarly journal and no volume, issue or page number information is yet available, treat the reference as of the type 'other' and include the reference's "digital object identifier" (doi) under the pids/pid element.

Element: name - name of an author of a reference
    Contains: (last, asc_last?, lead_initials)

Note: for "et al." do this: <name><last>et al.</last><lead_initials/></name>

Element: journal - journal name.
    Contains: (#PCDATA | b | i | u | sup | sub)*

Note: The journal name must end with a period (".").

Element: volume - journal volume
    Contains: (#PCDATA)

Element: issue - journal issue.
    Contains: (#PCDATA)

Element: start_page - starting page of referenced article
    Contains: (#PCDATA)

Element: end_page - ending page of referenced article
    Contains: (#PCDATA)

Element: year - year of publication
    Contains: (#PCDATA)

Element: citation - complete citation data for non-journal sources.
    Contains: (#PCDATA | b | i | u | sup | sub)*

Note: The citation data must end with a period (".").

Element: isbn - ISBN number of resource if available (optional)
    Contains: (#PCDATA)

Element: url - URL of a resource. Include the protocol, ie: <url>ftp://www.biologicalprocedures.com</url>
    Contains: (#PCDATA)

Element: protocols
    Contains: (p | section | figure | table)*

Element: b - bold text
    Contains: (#PCDATA | i | u | sup | sub)*

Element: i - italic text
    Contains: (#PCDATA | b | u | sup | sub)*

Element: u - underlined text
    Contains: (#PCDATA | b | i | sup | sub)*

Element: sup - superscript text
    Contains: (#PCDATA | b | i | u | sub)*

Element: sub - subscripted text
    Contains: (#PCDATA | b | i | u | sup)*

Element: br - line break
    Contains: EMPTY

Element: hr - horizontal rule (line)
    Contains: EMPTY

Element: figure - inserts an external graphic file into the document
    Contains: (caption, graphic)
    Attribute: id - identifier for the figure. It must start with the letter "f".

        Values: ID - Must start with the letter "f".

Element: caption - caption of the external graphic
    Contains: (p)+

Note: For figure titles, include these in bold within the caption. Ie:
<figure><caption><p><b>Title of figure</b>. Rest of caption</p></caption> <graphic thumb_href="f1sm.gif" thumb_type="gif" thumb_width="250" thumb_height="172" large_href="f1lg.gif" large_type="gif" large_width="632" large_height="436"/></figure>

Note: Do not include figure labels and numbers such as "Fig 1.". These are added automatically.

Element: graphic - contains a list of attributes specifying the graphic files of the image
    Contains: EMPTY

    Attribute: thumb_href - the filename in the current directory of a thumbnail version of the graphic. This version will display on the same web page as the complete article.
        Values: filename

    Attribute: thumb_type - the thumbnail graphic file format, either gif or jpeg

        Values: (gif | jpeg)

    Attribute: thumb_width - the width of the thumbnail graphic in pixels. Maximum width is 250 pixels.

        Values: number <= 250

    Attribute: thumb_height - the height of the thumbnail graphic in pixels. Maximum height is 250 pixels.

        Values: number <= 250

    Attribute: large_href - the filename in the current directory of an enlarged  version of the graphic. This version will display on a separate web page and be used in the preparation of a pdf file.
        Values: filename

    Attribute: large_type - the enlarged graphic's file format, either gif or jpeg

        Values: (gif | jpeg)

    Attribute: large_width - the width of the enlarged graphic in pixels.

        Values: number

    Attribute: large_height - the height of the enlarged graphic in pixels.

        Values: number

For example: <graphic thumb_href="f1sm.gif" thumb_type="gif" thumb_width="250" thumb_height="172" large_href="f1lg.gif" large_type="gif" large_width="632" large_height="436"/>.

Put thumbnail and full-size versions of the figure into the same directory as your XML file "articles.txt". Choose a high resolution for the large figure.

Element: table - start a table
    Contains: (title?, col+, tr+, caption?)
    Attribute: id - table identifier

        Value: ID - must start with "t".

       Ie: <table id="t1">
    Attribute: type - table display type
        Value: blank or "inline" to disignate an inline table.

Element: col - column width information
    Contains: EMPTY
    Attribute: width - width of the column

        Values: number - width of the column in relative units, approximating the number of centimeters the column should span. Sum of all table column widths to be <= 17.5

Note: Specify one col element for each column in your table, left to right. The sum of all column widths may not equal more than 17.5 relative units.

Ie: <col width="3"/><col width="5"/> for a two-column table.

Element: tr - table row
    Contains: (td+)

Element: td - table data. This envelopes the content of each cell.
    Contains: (p+)
    Attribute: rowspan - the number of rows the current cell should span
        Value: a number
    Attribute: colspan - the number of columns the current cell should span
        Value: a number

Note: rowspan and colspan are optional. If either is used, to decrease the number of cells in that row or column by the number of rows or columns spanned. Functionality is identical to HTML.

Element: a - like the HTML anchor element. It is used for user-defined cross-referencing between points in a document and for linking to an external URL.  Do NOT use this element for cross-referencing figures, tables or references. For that, use the xref element. Use ONLY ONE of the attributes below.

   Contains: (#PCDATA | sup | sub)*
   Attribute: href - hypertext reference for an external URL or the name of the target in the document to jump to when this link is followed.
        Values: external URL.
   Attribute: target - the name of a location in the same document to jump to when the user follows the link. Use this for creating intra-document links. Place this element at the starting point of the link. I.e.: "label2". Omit the leading # that is needed in the HTML equivalent (<a target="label2">Important location</a>).
        Values: ID - must not begin with r, t, f  or "loc" as these letters are reserved for ID's of references, tables and figures.
    Attribute: name - a named link equal to a user-defined target. Put this at the destination end of an intra-document cross-reference (<a name="label2">).
        Values: ID - must not begin with r, t or f or "loc".

Element: xref - cross-reference to a reference (bibliographic), figure, or table
    Contains: EMPTY
    Attribute: idref - the id of the reference, figure, or table being referred to

        Values: IDREF - the id of the referred reference, figure, or table that starts with an r, t or f

I.e.: To link to the reference with id="r13":  <xref idref="r13"/>, for the table with id="t2": <xref idref="t2"/>. The <xref/> element will be replaced with a number. Therefore, write "see Fig. <xref idref="f3"/>b for..." in your text and the result will appear as "see Fig. 1b for...", linked appropriately.

Element: inline - inline graphic for equations or unsupported characters. File type must be "gif".
    Contains EMPTY
    Attribute: href - location of the graphic file

        Values: a URL

    Attribute: height - height of the graphic

        Values: height in pixels

    Attribute: width - width of the graphic

        Values: width in pixels

Hint: use the inline element to include equations not easily created by special characters, tables with special formatting or graphics that should not be treated as figures.

Note: To place an inline graphic in its own paragraph, you must align the paragraph "center". For example, to include a table too complex to be coded by the <table> element, place it it a file such as table.gif centered in its own paragraph. Type: <p align="center"><inline href="table.gif" height="250" width="500"/></p>. Copy the gif file to the same directory as your "article.txt" XML file.

Element: list -  inserts a list
    Contains: (li+)
    Attribute: type - the type of list

        Values: (ol | ul) - ol for an ordered/numbered list, ul for an unordered/bullet list

    Attribute: ol_type - the type of numbering desired
        Values: (1 | a | i ) - numeric, alphabetic or roman numbering. If not specified, the default is numeric.
    Attribute: ol_start - starting number of ordered list
        Values: a number (defaults to 1 if not set)

Element: li - list item
    Contains: (p | figure | table)*

Disclaimer/Legal Privacy Policy

ISSN: 1480-9222
© 1997-2003 Biological Procedures Online