|
|
Standards and procedures followed at the Electronic Text Centre at the University of New Brunswick Libraries:These guidelines and help sheets are provided for, but are not limited to, anyone involved in a publishing project through the Electronic Text Centre at UNB Libraries. The Electronic Text Centre at the University of New Brunswick Libraries is committed to creating and publishing texts and images to international standards and accepted practices. The Centre currently Web publishes a variety of material including : university journals, government publications and reports, archives and special collections texts and images, commercial databases and The Telegraph Journal online archives. Part of the Centre's mandate is to provide electronic publishing support to the University of New Brunswick community. Staff at the UNB Libraries Electronic Text Centre provide training workshops for faculty, students and institutions involved with publishing projects through the Centre. The Centre is located in room 518 on the fifth floor of the Harriet Irving Library and is open weekdays from 8:30 - 5:00. Phone 506.447.3309
Quick Links: TEI-lite tutorial and guidelines
Text Encoding - Introduction At the UNB Electronic Text Centre electronic texts are encoded or marked up in compliance with the Text Encoding Initiative (TEI) markup scheme, an application of the Standard Generalized Markup Language (SGML). If you are familiar with an SGML application such as HTML you understand some of the basics of text encoding. If you are unfamiliar with any markup scheme we suggest you first read an introductory level tutorial such as ArborText's SGML:Getting Started or the introduction to Soft Quad's SGML:Getting Started The next step is for you to get acquainted with the TEI tagset . The TEI-lite tutorials at the TEI (Text Encoding Initiative) site as well as A Practical Introduction To the TagSet (from the University of Virginia Electronic Text Centre) are both comprehensive sites for this purpose. A quick list of TEI-lite tags with guidelines for their usage is available online as is a searchable online copy of the full TEIP3 Guidelines , from the University of Virginia Electronic Text Cente.
Transcription Perhaps the most important general principle to follow, particularly with transcriptions, is that everything in the source document gets transcribed as it is, including original and archaic spellings and other text features. There are appropriate tags to encode such information with. If you are transcribing an archival document you will most likely be working from a photocopy. Use the word processing application of your choice to type the text. Any words and markings you are unsure of can be circled on the photocopy (in pencil) and double checked with staff and local resources (ie. archival aids and indices). Your file will be assigned a unique name that will be the filename, the text id, and the id as recorded in the TEI Header. Bibliographic and typographic features (archival markings, running headers, letterhead seals or images, etc.) will be recorded in the TEI Header. Run the file through a spell-check program after you have finished the transcription. It is easier to do this at this point than after you have tagged the document. The
full TEIP3 Transcription Guidelines (Guidelines,
section 18)
The TEI Header - Information About Your Information After you have transcribed and keyed a document into electronic format you will next need to attach what's called the TEI Header to the top of your file. Every TEI conformant text is divided into at least two sections: the text and the header. The header is a compulsory part of any encoded text. It contains a number of elements with which to provide information about your information. It is with the header that you create a title page for the electronic version of a text. You also provide information about the source text from which an electronic text is derived as well as any editorial decisions made during the transcription and encoding of a text. A Sample
(Annotated) TEI Header from the UNB Electronic Text Centre
Sample Tagged Text:
The TEI Header is attached to the electronic file as soon as a document is transcribed. You will be given or pointed to a Header template. Fill this in as much as possible at this point in the process with information such as document title, publishing information, source document description, etc. : Sample TEI Header The first line is NOT part of the Header. It is
the Document Type Declaration and it appears as the first line of
a text file. All entities are declared after the DTD declaration
including image files and ISO entities. The tag <TEI.2> indicates the
beginning of your file.
<!NOTATION jpg SYSTEM "JPEG">
<!ENTITY % ISOlat1 system "ISOlat1"> %ISOlat1;
<resp>Creation of digital images: </resp>
<extent> file size will go here later when file is ready (x Kilobytes)</extent> <publicationStmt> <publisher>Electronic Text Centre at University
of New Brunswick Libraries</publisher>
<date> year, month (1997, August)</date>
<sourceDesc><biblFull>
<publicationStmt>
<seriesStmt><p> Proper collection name and
any unique identifying numbers IE. catalogue or folder number</p></seriesStmt>
<encodingDesc><projectDesc>Prepared for
the University of New Brunswick Electronic
</editorialDecl>
<revisionDesc>
Prose All tags are encoded in lower-case letters. The only exception is when the element consists of two words "figDesc", in which case the first letter of the second word will be in upper-case. Consult the TEI Guidelines for proper element names if you are uncertain. The <text> element is the first element after the Teiheader closes with </teiheader>. Assign a unique id to your document within the <text> element with the first letter of the document author's surname followed by the last two digits of the year, the month and the day. <text id="e560907"> (see below for full example) Following <text> will be <front> if there exists in the original document any front matter such as title page, introduction, acknowledgments, table of contents, etc. The <body> element proceeds <front> and is followed by a <div> tag. Number the <div> starting with "0" : <div0> Within this tag will be a "type" attribute : <div0 type="letter"> Consult with the project supervisor before assigning a value to this element.
Example of Tagged Prose <text id="e560907">
Page Breaks Page breaks are encoded with the <pb> tag and are used when ever there exists a page break in your document. The "n" attribute is used with <pb> as follows: <pb n="1"><pb n="2">, etc.
Additions and Deletions: Additions and deletions are marked with the <add></add>, <del></del> tags respectively. You can specify the location of either by using the "place" attribute: <add place="interlinear">but</add>
Original Spellings and Hyphenated Words: If you come across a misspelled word in the manuscript/typescript, encode it using the <orig> element. The regularized word is marked with <orig>: <orig reg="Saturday">Saterday</orig> Do not use <sic> unless it is clear that the word or use of grammar is indeed an error. Often times this is unclear - it is "safer" to encode with <orig>. End of line hyphenated words are encoded with the same element: Mr. Lougheed asked him to <orig reg="report">re-
Highlighted Words or Phrases: Text can be highlighted in various ways, including
italics, underlining, and bolded characters.
Use tags as outlined in Practical Introduction to the Tagset . Additional tags for drama include: <castList>
Example of Tagged Drama: <front>
<castItem><role>
<castItem><role>
<castItem type="role">
<body>
Use the same tags as outlined in A Practical Introduction to the Tagset. Additional tags for verse include: <l> to encode a line in a verse
Example of tagged verse: <lg type="verse">
Parsing Once a document has been marked up (including the header) it is ready to be parsed. Make sure you have saved your file in ASCII format as the parsing program (ie. nsgmls) will not parse a WordPerfect or Word file. Nsgmls will first check the TEI or TEI-lite DTD (Document Type Definition) to make sure its structure is correct. You need not worry about this unless you are using your own DTD or have modified the TEI. The nsgmls software will then go through your document to ensure that all of the tagging is correct, both in terms of structure and syntax. Once finished it will give you a list of errors that will need correcting. Common errors include : 1. A syntax error - failure to wrap an attribute
value in double quotes: Once you have a list of parsing errors, correct the first few then reparse. This will very often "correct" all errors that proceeded the initial one. Your project resource person will help you get started with parsing and will run a final parsing check on your files before they are put on the server.
|