Standards and procedures followed at the Electronic Text Centre at the University of New Brunswick Libraries:

Help sheets for creating electronic texts 
    
These guidelines and help sheets are provided for, but are not limited to, anyone involved in a publishing project through the Electronic Text Centre at UNB Libraries.   

The Electronic Text Centre at the University of New Brunswick Libraries is committed to creating and publishing texts and images to international standards and accepted practices. The Centre currently Web publishes a variety of material including : university journals, government publications and reports, archives and special collections texts and images, commercial databases and The Telegraph Journal online archives.

Part of the Centre's mandate is to provide electronic publishing support to the University of New Brunswick community. Staff at the UNB Libraries Electronic Text Centre  provide training workshops  for faculty, students and institutions involved with  publishing projects through the Centre. The Centre is located in room 518 on the fifth floor of the Harriet Irving Library and is open weekdays from 8:30 - 5:00. Phone 506.447.3309



Quick Links:

TEI-lite tutorial and guidelines

A list of TEI-lite tags

The full TEIP3 Guidelines

The TEI Header

The UNB TEI Header Template

A Gentle Introduction to SGML

Parsing


Text Encoding - Introduction  

At the UNB Electronic Text Centre electronic texts are encoded or marked up in compliance with the Text Encoding Initiative (TEI) markup scheme, an application of the Standard Generalized Markup Language (SGML).  

If you are familiar with an SGML application such as HTML  you understand some of the basics of text encoding. If you are unfamiliar with any markup scheme we suggest you first read an introductory level tutorial such as ArborText's SGML:Getting Started or the introduction to Soft Quad's SGML:Getting Started  

The next step is for you to get acquainted with the TEI  tagset . The TEI-lite tutorials at the TEI (Text Encoding Initiative) site as well as   A Practical Introduction To the TagSet (from the University of Virginia Electronic Text Centre) are both comprehensive sites for this purpose.  

A quick list of TEI-lite tags with guidelines for their usage is available online as is a searchable online copy of the full TEIP3 Guidelines , from the University of Virginia Electronic Text Cente.  



Transcription   

Perhaps the most important general principle to follow, particularly with transcriptions, is that everything in the source document gets transcribed as it is, including original and archaic spellings and other text features. There are appropriate tags to encode such information with.  

If you are transcribing an archival document you will most likely be working from a photocopy. Use the word processing application of your choice to type the text. Any words and markings you are unsure of can be circled on the photocopy (in pencil) and double checked with staff and local resources (ie. archival aids and indices). Your file will be assigned a unique name that will be the filename, the text id, and the id as recorded in the TEI Header.  

Bibliographic and typographic features (archival markings, running headers, letterhead seals or images, etc.) will be recorded in the TEI Header.  

Run the file through a spell-check program after you have finished the transcription. It is easier to do this at this point than after you have tagged the document.  

The full TEIP3 Transcription  Guidelines  (Guidelines, section 18)  
Selected Transcription tags  



The TEI Header - Information About Your Information  

After you have transcribed and keyed a document into electronic format you will next need to attach what's called the TEI Header to the top of your file.  

Every TEI conformant text is divided into at least two sections: the text and the header. The header is a compulsory part of any encoded text. It contains a number of elements with which to provide information about your information. It is with the header that you create a title page for the electronic version of a text. You also provide information about the source text from which an electronic text is derived as well as any editorial decisions made during the transcription and encoding of a text.  

A Sample (Annotated) TEI Header from the UNB Electronic Text Centre  
Download TEI Header  


Sample Tagged Text:  

Prose  
Verse  
Drama  


The TEI Header  
   

The TEI Header is attached to the electronic file as soon as a document is transcribed. You will be given or pointed to a Header template. Fill this in as much as possible at this point in the process with information such as document title, publishing information, source document description, etc. :  

Sample TEI Header  

The first line is NOT part of the Header. It is the Document Type Declaration  and it appears as the first line of a text file. All entities are declared after the DTD declaration  including image files and ISO entities. The tag <TEI.2> indicates the beginning of your file.  
   
<!DOCTYPE TEI.2 system "teilite.dtd"[  

<!NOTATION jpg SYSTEM "JPEG">  
list all external files (images) here with this syntax:  
<!ENTITY ca120915 SYSTEM "ca120915.jpg" NDATA jpg>  

<!ENTITY % ISOlat1 system "ISOlat1"> %ISOlat1;  
 <!ENTITY % ISOlat2 system "ISOlat2"> %ISOlat2;  
 <!ENTITY % ISOnum system "ISOnum"> %ISOnum;  
 <!ENTITY % ISOpub system "ISOpub"> %ISOpub;  
 <!ENTITY % ISOtech system "ISOtech"> %ISOtech;  
]>  
<TEI.2>  
<teiHeader type=aacr2>  
<fileDesc>  
<titleStmt>  
<title> Letter|Postcard|Note from so and so to so and so, June 15, 1934 : a machine-readable transcription.</title>  
<author> Surname, first, dates</author>  
<respStmt>  
<resp>Creation of machine-readable version: </resp>  
<name> your name, Institution, department</name>  

<resp>Creation of digital images: </resp>  
<name>Name, Institution, department </name>  
<resp>Conversion to TEI.2-conformant markup: </resp>  
<name> Name, Institution, department of person(s) responsible for images</name>  
</respStmt>  
</titleStmt>  

<extent> file size will go here later when file is ready (x Kilobytes)</extent>  

<publicationStmt>  

<publisher>Electronic Text Centre at University of New Brunswick Libraries</publisher>
<pubPlace>Fredericton, N.B.</pubPlace>  
<idno> file name (ca120915)</idno>  
<availability><p>Publicly-accessible</p>  
<p>URL: http://www.unb.ca/etc</p>  
<p>Copyright University of New Brunswick; all rights reserved. </p>  
</availability>  

<date> year, month (1997, August)</date>  
</publicationStmt>  
<notesStmt><note>Images of the manuscript version have been included.</note></notesStmt>  

<sourceDesc><biblFull>  
<titleStmt>  
<title> Title without machine readable statement. Same format as above.</title>  
<author> First, Surname</author>  
</titleStmt>  
<editionStmt><p></p></editionStmt>  
<extent> in pages</extent>  

<publicationStmt>  
<publisher></publisher><pubPlace></pubPlace><date></date>  
</publicationStmt>  

<seriesStmt><p> Proper collection name and any unique identifying numbers IE. catalogue or folder number</p></seriesStmt>  
<notesStmt><note>.</note></notesStmt>  
</biblFull>  
</sourceDesc>  
</fileDesc>  

<encodingDesc><projectDesc>Prepared for the University of New Brunswick Electronic  
Text Centre.</projectDesc>  
<editorialDecl>(Some examples. Yours will be unique to your documents)  
<p>Verification has been made against the manuscript version.</p>  
<p>All long s's have been normalized in the electronic version of the manuscript  
Original spelling is retained.</p>  
<p>The images exist as archived TIFF images, one or more JPEG versions  
for general use, and thumbnail GIFs.</p>  
<p>Items added are assumed to be interlinear unless otherwise noted.  
Items deleted are assumed to be scored through unless otherwise noted.  All  
manuscript corrections are in the hand of the author,</p>  

</editorialDecl>  
<classDecl>  
<taxonomy id=LCSH>  
<bibl><title>Library of Congress Subject Headings</title></bibl>  
</taxonomy>  
</classDecl>  
</encodingDesc>  
<profileDesc>  
<creation><date> Date of creation here (of original)     1879-01-28</date>  
</creation><langUsage>  
<language id="en">English</language></langUsage>  
<textClass>  
<keywords>  
<term>non-fiction; prose</term>  
<keywords scheme="LCSH">  
<term type="fieldxxx">subject heading goes here</term>  
<term type="visual work">Manuscript pages</term>  
</keywords>  
</textClass>  
</profileDesc>  

<revisionDesc>  
<change><date></date>  
<respStmt><resp>Correction and final parsing by:</resp><name>Lisa Charlong, Electronic Text Centre at University of New Brunswick Libraries</name></respStmt>  
<item></item></change>  
</revisionDesc>  
</teiHeader>  



Prose  

All tags are encoded in lower-case letters. The only exception is when the element consists of two words "figDesc", in which case the first letter of the second word will be in upper-case. Consult the TEI Guidelines for proper element names if you are uncertain.  

The <text> element is the first element after the Teiheader closes with </teiheader>. Assign a unique id to your document within the <text> element with the first letter of the document author's surname followed by the last two digits of the year, the month and the day.  

<text id="e560907"> (see below for full example)  

Following <text> will be <front> if there exists in the original document any front matter such as title page, introduction, acknowledgments, table of contents, etc.

The <body> element proceeds <front> and is followed by a <div> tag. Number the <div> starting with "0" : <div0> Within this tag will be a "type" attribute : <div0 type="letter"> Consult with the project supervisor before assigning a value to this element.  



Example of Tagged Prose  

<text id="e560907">  
<body><p><figure entity="ed00001"></figure></p>  
<div1 type="letter">  
<pb n=1>  
<head><address> <addrLine>10 Downing Street </addrLine> <lb>  
<addrLine>Whitehall </addrLine> </address> <lb>  
<date> September 7, 1956</date> </head>  
<salute> My dear Max,  
</salute>  
<p> I disobey your order to say <lb>  
thank you. This is really very good news about <lb>  
<rs type="person">Winston </rs>. I have not liked to ask and it is <lb>  
brave that he should now volunteer. If in the <lb>  
event a speech should prove physically too much <lb>  
for him a <add> published</add> statement would have almost the same <lb>  
effect. </p>  
<p> But what really matters is that he should <lb>  
feel as you describe. For that and for your <lb>  
help I am more than grateful. </p>  
<closer> Yours ever <lb>  
<signed> Anthony</signed></closer>  
<trailer><name type="person">  The Rt. Hon. Lord Beaverbrook. </name></trailer>  
</div1>  
</body>  
</text>  
</TEI.2>  



Page Breaks  

Page breaks are encoded with the <pb> tag and are used when ever there exists a page break in your document. The "n" attribute is used with <pb> as follows: <pb n="1"><pb n="2">, etc.  



   
Additions and Deletions:  

Additions and deletions are marked with the <add></add>, <del></del> tags respectively. You can specify the location of either by using the "place" attribute:  

 <add place="interlinear">but</add>  
 <del place="supralinear">and</del>  


Original Spellings and Hyphenated Words:  

If you come across a misspelled word in the manuscript/typescript, encode it using the <orig> element. The regularized word is marked with <orig>:  

 <orig reg="Saturday">Saterday</orig>  

Do not use <sic> unless it is clear that the word or use of grammar is indeed an error. Often times this is unclear - it is "safer" to encode with <orig>.  

End of line hyphenated words are encoded with the same element:  

 Mr. Lougheed asked him to <orig reg="report">re-  
 port</orig> to Colonel McAllister.  


Highlighted Words or Phrases:  

Text can be highlighted in various ways, including italics, underlining, and bolded characters.  
Use the <hi> element with the "rend" attribute to encode these instances:  
   
 <hi rend="underlined"></hi>  
 <hi rend="bold"></hi>  
 <hi rend="italics"></hi>  
   


Drama:  

Use tags as outlined in Practical Introduction to the Tagset . Additional tags for drama include:  

<castList>  
<castItem>  
<role>  
<roleDesc>  
<sp>  
<speaker>  
<stage>  


Example of Tagged Drama:  

<front>  
<div1 type="dramitis personae"><head>DRAMATIS PERSONAE</head>  
<castList>  
<castItem ><role id="sirp">  
Sir Portly</role> <roleDesc> &amp;lpar;who takes the part of &ldquo;heavy Comedian&rdquo; &mdash; at K. M. G. &mdash;  
Knight  
of Money and Grub &mdash; Baron of the Half Acre, formerly Commander of the Royal Fleet, and a  
descendant of Sir John Falstaff.</roleDesc></castItem>  
<castItem><role id="bill">  
Billh Ickmann</role><roleDesc> His Esquire, and bearer of the Shield of Brass, Governor of  
Dorch Esterisland &mdash; a playful fellow with a wicked eye.</roleDesc>  

<castItem><role>  
Terence </role><roleDesc> Surnamed the Terrible, a warlike Celt.</roleDesc>  
<castItem>  
<role>  
Kailwite</role><roleDesc> One of the exiled Acadian people, Butcher Royal, and a Counsellor  
among the exiles.</roleDesc></castItem>  

<castItem><role>  
Thomas of Picardy</role><roleDesc> A Scribem of the knight's  
adherents.</roleDesc></castItem>  

<castItem type="role">  
Scribes, Pages, imps and others of the Knightly retinue, will be introduced from time to time &mdash;  
Acadians ubique.</castItem></castList></div1>  
<div1 type="prologue">  
<head>PROLOGUE, spoken by an Imp</head>  
<sp>  
<p>A tale of the darksome days.  Hidden was the sun and sad was the sign of the winds.  Dark  
was the brow of the noble Knight, and tremulous with wrath his mighty frame.  he stood on the  
...</p></sp>  
<sp><p>  
&amp;ldquo;Why art then silent, O Billh Ickmann, and wherefore art thou sad?  Behold afar our  
argosies,</p></sp>  
<sp><p>  
;</p></sp>  
</div1>  
</front>  

<body>  
<div1 type="act" n="1">  
<head>ACT 1</head>  
<div2 type="scene" n="1"><head>Scene 1st.</head>  
<stage type="setting">  
 The Knight's Armory. &amp;mdash;  Sir Portly sitting in an easy chair. &amp;mdash;  Two Acadians  
about retiring.  
</stage>  
<sp><speaker>ACADIAN</speaker><p> Much obliged, Mistairemit.</p></sp>  
<sp who="sirp"><speaker>SIR P.</speaker>  
<p> &amp;mdash; Sixteen per cent., remember, and you must pay for searching the records.  
</p></sp><sp>  
<speaker>ACADIAN</speaker>  
<p> &amp;mdash; Oui, oui.  At ten o'clock tomorrow.  Bon jour.</p></sp>  
<stage type="exit">&amp;lsqb;Exeunt both Acadians.&amp;rsqb;</stage>  
<sp who="sirp"><speaker>SIR P.</speaker>  
<lg type="verse">  
<l> &amp;mdash; Now, by my faith, another loan &amp;mdash;</l>  
<l>Another piece of marsh I'l own;</l>  
<l>And still my heart is bound with grief</l>  
<l>And lands and gold bring no relief &amp;mdash;</l>  
<l>My land is now a burthen made,</l>  
<l>By those who Dan and Peter aid</l>  
<l>I a Sir Portly, K. M. G.,</l>  
</lg>  
<stage type="entrance">&amp;lsqb;Enter Cailwite.&amp;rsqb;</stage>  
</sp><sp>  
<speaker>CAIL</speaker>  
</div2></div1></body></text>  
</TEI.2>  


Verse:  

Use the same tags as outlined in A Practical Introduction to the Tagset. Additional tags for verse include:  

<l> to encode a line in a verse  
<lg> to encode the entire stanza or verse. <lg> is the wrapper element in which <l> nests.  


Example of tagged verse:  

<lg type="verse">  
<l> &amp;mdash; Now, by my faith, another loan &amp;mdash;</l>  
<l>Another piece of marsh I'l own;</l>  
<l>And still my heart is bound with grief</l>  
<l>And lands and gold bring no relief &amp;mdash;</l>  
<l>My land is now a burthen made,</l>  
<l>By those who Dan and Peter aid</l>  
<l>I a Sir Portly, K. M. G.,</l>  
</lg>  
       



Parsing  

Once a document has been marked up (including the header) it is ready to be parsed. Make sure you have saved your file in ASCII format as the parsing program (ie. nsgmls) will not parse a WordPerfect or Word file.  

Nsgmls will first check the TEI or TEI-lite DTD (Document Type Definition) to make sure its structure is correct. You need not worry about this unless you are using your own DTD or have modified the TEI.  

The nsgmls software will then go through your document to ensure that all of the tagging is correct, both in terms of structure and syntax. Once finished it will give you a list of errors that will need correcting.  

Common errors include :  

1. A syntax error - failure to wrap an attribute value in double quotes:
<note target="n67"> or <div1 type="chapter" n="1">  
2. A structure error - failure to close a tag in the appropriate place:
<name type="person"><hi rend="bold">James Brown</name></hi>  
3. Another structure error - putting a tag in the wrong place. Period.:
<div1 type="letter"><addrLine> where the tag <address> was left out.  

4. Misspelling a generic identifier: <nmae type="person">  

Once you have a list of parsing errors, correct the first few then reparse. This  will very often "correct"  all errors that proceeded the initial one.  

Your project resource person will help you get started with parsing and will run a final parsing check on your files before they are put on the server.  



   
Maintained by Lisa Charlong  
lcharlon@unb.ca  
Last modified: November 25, 1998