CPP: Transcription Guidelines

Cambridge Platonists Project: Transcription Guidelines

by Michael Hawkins

Based upon the John Young, Michael Hawkins and Robert Ralley, Casebooks Project Transcription Guidelines (Cambridge: 2011), http://www.magicandmedicine.hps.cam.ac.uk/using-our-edition/editorial-policies/transcription-guidelines, accessed 2016-09-08.

Introduction
Document Structure
The Header
Normalisation
Content Tagging
Special Characters

Introduction

The Cambridge Platonists’ schema is based on the Text Encoding Initiative P5 Guidelines, which are widely considered to represent best-practice within digital textual transcription. The text of our guidelines are adapted (with permission) from John Young, Michael Hawkins and Robert Ralley, Casebooks Project Transcription Guidelines (Cambridge: 2011) and John T. Young, Newton Project: Transcription and Tagging Guidelines and XML Tag Set (Sussex: 2011).

Our Guidelines are to be used in conjunction with our accompanying Element Set documentation. They are intended to serve as an easily accessible introduction to our transcription policy and a broad outline of which tags to use and where. The Element Set, in contrast, is a reference work that gives specific (and copious) details on what exactly is required or permitted in each element and where it may (or may not) be used.

So far as possible, jargon is kept to a minimum, but familiarity has been assumed with certain key XML terms and the means of representing them: principally element (and the distinction between empty and non-empty elements), attribute, attribute value and entity. It is essential that transcribers are entirely clear about these terms, and understand the principle of nesting elements. They are not difficult to grasp and are explained in any guide to XML. Newcomers to XML should read ‘Gentle Introduction to XML’ in order to familiarise themselves with the basics.

A particularly useful feature of XML transcriptions are ‘XML comments’ and editors are strongly encouraged to make liberal use of them. They take the form ‘’ and may contain, between the two sets of double dashes, any comment the transcriber wishes to make for in-house purposes on the source or the transcription, e.g. ‘’ or ‘’. The contents of XML comments will never be made publicly visible. They exist solely as an aid to the editors. They can be very helpful since they can be used to describe particularly hard to read passages, or to explain why a potentially contentious coding decision has been made. They can also be used to express any doubts or queries about a particular interpretation or a passage or word, or in short to convey any information not covered by the tagging that may be of use to the editors when reviewing the text. It is helpful for such comments to be signed, as in the above examples, with the relevant commenter’s initials.

A number of particular, capitalised, code headings should be used in most comments, such as ‘CHECK’ (meaning ‘check original’), ‘CODIC’ (a codicological comment), ‘TODO’ (a task that must be done by an editor) and ‘TRANSC’ (a comment regarding the transcription or coding of a particular feature in the text). They would be used in the following fashion: ’ and ‘’. These sorts of comments are absolutely indispensable to the person who does the final-checking, as they enable her or him quickly to identify (by means of a global search) what items need to be revisited and resolved.

Please pay especial attention to the instructions concerning spacing around elements. Some of these may look like rules for rules’ sake but they are there for a purpose and have important ramifications for the display of the transcribed texts. When reading densely marked-up text in XML, it is alarmingly easy to overlook the presence of a space that should not be there or the absence of one that should be.

Document Structure

Each document is entirely enclosed in a so-called ‘root element’, <TEI>. This is divided into two main component elements: <teiHeader> and <text> (always in that order). The <teiHeader> is reserved for metadata: information about the source document, a record of the work that has been done on it, and the id values of the languages and hands that feature in it and of the transcribers and checkers who have worked on it. The transcription proper is inside <text>, wholly enclosed by its immediate children <front> (optional), <body> (mandatory) and <back> (optional). <front> and <back> are optional elements that are used to contain the front or back matter to a work, such the Title Page, Preface to the Reader, Table of Contents at the front of a book and the indices, errata or printer’s list of other works available at the back of the volume. <body> is mandatory and contains the transcription of the main text. <front>, <body> and <back> will contain one or more <div>s, each of which contains a discrete ‘section’ of the text, e.g. a Dedicatory Epistle, Preface to the Reader, individual chapters, sub-chapters or any other clearly defined logical structural divisions. Thus every document will have the following structure: <TEI> <teiHeader/> <text> <front/> <body> <div/> <div/> <div/>  </body> <back/> </text> </TEI>

Contact Michael Hawkins if you encounter a printed volume in which the publisher has bound together multiple printed books, each with their own front and/or back matter, into a single volume (e.g. in a ‘Collected Works’ or ‘Opera Omnia’ edition).

Although editors will predominately be concerned with the textual transcription contained within <front>, <body> and possibly <back>, they will nonetheless also be required to ensure the accuracy of the information within <teiHeader>, working closely with Michael Hawkins.

The Header (`<teiHeader>`)

Metadata (information about the electronic file and its contents) is recorded in a <teiHeader> element, which has three components, <fileDesc>, <profileDesc> and <revisionDesc> (always in that order). <teiHeader> <fileDesc>...</fileDesc> <profileDesc>...</profileDesc> <revision>...</revision> </teiHeader>

I. `<fileDesc>`

<fileDesc> describes the electronic file and its source. It contains the following elements:

1) <titleStmt>, which contains two elements: <title> and <author>.

2) <publicationStmt>, which contains <publisher> (Cambridge Platonists Project), <pubPlace> (Cambridge) and an empty <date>, with the date expressed in ISO style (‘yyyy-mm-dd’) in @when: <date when="2016-09-30"/>.

3) <notesStmt> (optional), which contains one or more <note>s, which in turn may contain either text or <p> (though it seems unlikely that a note here will run to more than one paragraph; if it does it consider carefully whether your proposed text really represents one ‘note’ or whether it might not be better represented using multiple <note>s). These notes may optionally have a @resp value of ‘#’ and the relevant editor’s @xml:id value if he or she wishes to claim authorship) and can be used to supply more detailed information about the source document or particular problems associated with its transcription, e.g. ‘The manuscript is water-damaged here’.

Where such references to other transcriptions are included, a link should be provided by means of a <ref> element as explained in the entry for <ref> in the Element Set.

There can be no hard and fast rules about what can or cannot be expressed in <notesStmt> and notes could be used to describe the source document itself, its content, or the interpretation of that content. So far as possible, however, information about the document itself and its contents should be recorded in the more specialised tagging available rather than the <notesStmt>. However, <notesStmt> provides a useful fallback option for at least provisionally recording any supplementary data. As usual, if you are in doubt, please contact Michael Hawkins.

4) <sourceDesc>, containing either <biblStruct> (if the original source was a printed work) or <msDesc> (if it’s a manuscript)

a) <biblStruct> encodes complete bibliographical information about the printed source. While these Guidelines ultimately aim to provide examples of how to code every possible type of printed work, it is quite likely that you will encounter printed works that don’t readily fit into any of the following examples below. Additionally, we may decide as a project that we wish to include more information about the physical volume than the current proposals allow. Should either happen, please create an issue in Freedcamp so we can establish how these things should be encoded.

i) The work is from a printed monograph <biblStruct> <monogr> <author><persName><forename>Henry</forename> <surname>More</surname></persName></author> <title>An antidote against atheisme, or, An appeal to the natural faculties of the minde of man, whether there be not a God by Henry More</title> <title type="short">An antidote against Atheisme</title>  <imprint> <pubPlace>London</pubPlace> <publisher>Printed by Roger Daniel</publisher> <date when="1653-01-01">1653</date> </imprint> </monogr> </biblStruct>

ii) The work is from a named ‘article’/‘book’ within a larger monograph <biblStruct> <analytic> <author><persName><forename>Henry</forename> <surname>More</surname></persName></author> <title>A Letter to Mons. Dela Crose; together with some Reflections on the Letter to Charles Blount Esq , concerning Natural Religion, as oppos’d to Divine Revelation: And also on that Infamous Book, entituled, The Naked Gospel.</title> <title level="j" type="short">A Letter to Mons. Dela Crose; together with some Reflections on the Letter to Charles Blount</title> </analytic> <monogr> <author><persName><forename>Henry</forename> <surname>More</surname></persName></author> <editor role="editor"/> <title level="j">Letters on several subjects with several other letters : to which is added by the publisher two letters, one to the Reverend Dr. Sherlock, Dean of St. Paul’s, and the other to the Reverend Mr. Bentley. With other discourses. Publish’d by the Reverend Mr. E. Elys.</title> <title level="j" type="short">Letters on several subjects</title> <edition/> <imprint> <pubPlace>London</pubPlace> <publisher>Printed by W. Onely, for Iohn Everingham, at the Star in Ludgate-street, near St. Paul’s</publisher> <date when="1694-01-01">1694</date> <biblScope unit="pp">115-122</biblScope> </imprint> </monogr> </biblStruct>

iii) The work is from a journal <biblStruct> <analytic> <author><persName><forename>Isaac</forename> <surname>Newton</surname></persName></author> <title>An Accompt of a New Catadioptrical Telescope invented by Mr. Newton</title> </analytic> <monogr> <title level="j">Philosophical Transactions of the Royal Society</title> <title level="j" type="short">Philosophical Transactions</title> <imprint> <pubPlace>London</pubPlace> <date>25 March 1672</date> <biblScope unit="no">81</biblScope> <biblScope unit="pp">4004-4007</biblScope> </imprint> </monogr> </biblStruct>

b) <msDesc> provides complete bibliographical information about the manuscript source. It contains:

i) <msIdentifier> (required), which in turn contains mandatory <country>, <settlement>, <repository> and <idno> elements as well as an optional <collection>.

The content of <idno> consists of the manuscript number in question, e.g. ‘MS Additional 3975’.

ii) <msContents> is used to describe the pages (or folios) for the tract that is being transcribed. For example, a draft version of Isaac Newton’s theory concerning light and colors appears on folios 460r–466r of CUL MS Add. 3970.3. However, we might only have decided that we are interested in Newton’s thirteen propositions, which occur on ff. 262v-465r. It would be encoded as follows <msDesc> <msIdentifier> <repository>Cambridge University Library</repository> <collection>Portsmouth Collection</collection> <idno>MS Add. 3970.3</idno> </msIdentifier> <msContents> <msItem class="#full"> <title copyOf="#main_title"/> <locus from="460r" to="466r">ff. 460r-466r</locus> </msItem> <msItem class="#excerpt"> <title>Thirteen Propositions</title> <locus from="460r" to="466r">ff. 462v-465r</locus> </msItem>  </msContents> </msDesc>

II. `<profileDesc>`

This element contain three children (<creation>, <langUsage> and <handNotes>) that provide information on the contents of the file, such as the languages or manuscript hands contained within it or details about when and where it was created.

1) <creation> is used to record the date and place where the work was created/published. It contains two children for recording this information: <origDate> and <origPlace>: <creation> <origDate when="1711-01-01">post-1710</origDate> <origPlace>England</origPlace> </creation>

The values that are put into <origDate> and <origPlace> should be as accurate as possible with agonising over them. For some transcriptions, this might be a broad date range and country of origin but for others, it could be a specific year (or date) and a city, town or village. For example, the <creation> coding for Henry More’s Divine Dialogues (London: 1668) would be: <creation> <origDate when="1668-01-01">1668</origDate> <origPlace>London, England</origPlace> </creation>

2) <langUsage> (mandatory). This contains one or more <language> elements, each with the @ident value as defined by the Internet Assigned Numbers Authority (IANA), such as:

‘en-emodeng’ (Early Modern English)
‘Latn’ (Latin)
‘grc’ (Classical Greek, pre. 1453)
‘hbo’ (Ancient Hebrew)
‘fr-1694acad’ (Early Modern French)

If a document contains more than one language they should be listed in order of priority: e.g. if the document is primarily in Latin with a few words of English, <language ident="Latn"> should be the first element in <langUsage>; if the document was primarily in Early Modern English with a few words of Latin, <language ident="en-emodeng"> should be the first <language> element: <langUsage> <language ident="en-emodeng">English</language> <language ident="Latn">Latin</language> </langUsage>

For guidance on how to deal with language changes within the body text, see the Content tagging section of the Guidelines.

3) <handNotes> (required if the document is a manuscript). This in turn contains one or more <handNote> elements containing either an @xml:id value for each scribe that is not the author of the document or an @sameAs if the document contains text in the author’s hand. If using @sameAs, you should point it to the <author> (who must in that case have a defined @xml:id). All identifiable hands, including those of the unknown cataloguers who wrote in the foliation, must be recorded. List the hands in order of their dominance within the text (i.e. the most common hand comes first; the least common comes last): <handNotes> <handNote sameAs="#in">Holograph</handNote> <handNote xml:id="unknown1">Unidentified Cataloguer 1</handNote> <handNote xml:id="unknown2">Unidentified Cataloguer 2</handNote> </handNotes>

For guidance on how to deal with hands within the body text, see the Content tagging section of the Guidelines.

III. `<revisionDesc>`

<revisionDesc> is used to record the work done on the electronic file, and consists of an indefinite series of <change> elements each with an @when value giving the date, and content giving a natural-language account of each significant revision. The person who made it goes in a <name> element with an @xml:id value that will be assigned by the senior editors when someone starts work, but is normally based on that person’s first initial plus her or his surname, e.g. ‘mburden’ (Mark Burden), ‘mhawkins’ (Michael Hawkins), ‘dhedley’ (Douglas Hedley), ‘chengstermann’ (Christian Hengstermann), ‘shutton’ (Sarah Hutton) or ‘dleech’ (David Leech).

Proofreading a file counts as a <change> even if the proofreader has not in fact made any changes to it. The file itself may not have changed but its status has.

If the @xml:id value of any of the people working on the document has been previously declared, subsequent occurrences of her or his <name> take a @sameAs value of # followed by the relevant string.

For instance: <revisionDesc> <change when="2016-10-12">Transcribed by <name xml:id="chengstermann">Christian Hengstermann</name>.</change> <change when="2016-10-21">Coding of table updated by <name xml:id="mhawkins">Michael Hawkins</name>.</change> <change when="2016-11-01">Text proofed by <name xml:id="dhedley">Douglas Hedley</name>.</change> <change when="2016-11-01">Corrections entered by <name sameAs="#chengstermann">Christian Hengstermann</name>.</change> </revisionDesc>

Normalisation

Where it has been deemed appropriate to clarify original spelling or terminology using modernised or standardised forms, the usual mechanism is to use <orig> and <reg> tags within <choice>. The content of <orig> will appear in the diplomatic view and the content of <reg> in the normalised.

1) In general, semantically insignificant distinctions between letter forms can be disregarded: thus, short and long ‘s’, medial and terminal ‘f’, and so forth, need not be differentiated.

However, some letter forms are of sufficient intrinsic interest to warrant distinct encoding. The letter ‘thorn’, used as an abbreviation of ‘th’ but written exactly or almost exactly like ‘y’ in Early Modern English, should be encoded as the entity þ. This provides the option of expanding it to ‘th’ in the normalised view and presenting it either as ‘y’ or as the Unicode thorn character (‘þ’) in diplomatic. If a word begins ‘ff’ (functioning like a capital ‘F’), this should be encoded as the entity &ff;.

With the exception of æ, Æ, œ, Œ (in any language) and ß (in German), we do not record that certain character combinations might be represented as ligatures in the original text. For example, if you encounter a ligature joining the characters ‘c’ and ‘t’, you would transcribe it simply as ‘ct’. The same applies to ligatures in other languages, like Classical Greek. The only exceptions would be for characters that are or were distinct letters within that language’s alphabet. If you were transcribing a German text, you would record eszett characters (‘ß’) as eszett characters using ß. However, if you were to encounter a ligature joining a long s (‘ſ’) and a short s (‘s’) in a language that does not/did not contain the eszett as a distinct character, such as English, you would transcribe it as ‘ss’.

2) Unless a scribal hand distinguishes between upper case I/J and/or U/V, these upper case forms should be treated as I and V but provided, if necessary, with a regularisation, thus: ‘ <choice><orig>I</orig><reg>J</reg></choice>esus’, ‘ <choice><orig>J</orig><reg>I</reg></choice>raelites’, ‘his <choice><orig>V</orig><reg>U</reg></choice>nkle’ and <choice><orig>U</orig><reg>V</reg></choice>otive. The project has defined four custom entities that can be used as short-hand and the previous examples could be more concisely coded:

&IConsonant;, e.g. <choice><orig>I</orig><reg>J</reg></choice>esus
&JVowel;, e.g. <choice><orig>J</orig><reg>I</reg></choice>sraelites
&VVowel;, e.g. <choice><orig>V</orig><reg>U</reg></choice>nkle
&UConsonant;, e.g. <choice><orig>u</orig><reg>v</reg></choice>otive

Lower case i/j and u/v are just as complicated. These should, where appropriate, be regularised using the entities &jVowel; for ‘<choice><orig>j</orig><reg>i</reg></choice>’ (i.e. ‘j’ being used as a vowel), and &iConsonant;, &vVowel; and &uConsonant; on the same principle.

Please pay careful attention when using these project defined entities to ensure that you enter the correct one. &IConsonant; is used for a capital ‘I’ being used as a capital ‘J’ and &iConsonant; is used when a lowercase ‘i’ is being used as a lowercase ‘j’. Should you encounter more complex situations, such as a lowercase ‘i’ that you wish to represent as an uppercase ‘J’, this would have to be coded out the long way, i.e. <choice><orig>i</orig><reg>J</reg></choice>esus.

Where ‘I’, ‘V’, ‘i’ and ‘v’ do equate to modern ‘I’, ‘V’, ‘i’ and ‘v’, there is obviously no need to regularise so they can simply entered.

Roman numerals, whether lower or upper case, can generally be left as they stand unless they are combined with Arabic numerals in a single number, e.g. ‘the 3i March’, which should be normalised to ‘the 3<choice><orig>i</orig><reg>1</reg></choice> March’. However, where lower-case ‘j’ is used to mean ‘1’, either as a numeral in its own right or as the last part of a longer Roman numeral, it should be normalised as ‘<choice><orig>j</orig><reg>i</reg></choice>’ (or for brevity’s sake as &jVowel;, despite its not actually being a vowel in this instance).

Initial ‘UU’ and ‘VV’ should be encoded as ‘<choice><orig>UU</orig><reg>W</reg></choice>’ or ‘<choice><orig>VV</orig><reg>W</reg></choice>’.

Very few early modern hands distinguish between the ‘ae’ (’æ’) and ‘oe’ (’œ’) ligatures, so these should always be transcribed as æ unless the scribe in question clearly does differentiate between them, in which case the ‘oe’ ligature should be transcribed as &oelig;.

3) Although we will not be routinely regularising capitalisation in the source text, there might be a few very rare instances where this is necessary because the appearance of a character in lowercase (or uppercase) is jarring to even a scholar well-versed in Early Modern typography. For example, if a proper name lacked an initial capital (e.g. ‘aristotle’), this would be corrected using similar <choice> coding. However, in this case, we’d be using <sic> to record the original version and <corr> to record the editor’s correction. For example, ‘aristotle’ would be coded ‘<choice><sic>aristotle</sic><corr>Aristotle</corr></choice>’. On the distinction between <sic>/<corr> and <orig>/<reg>, see below

Since we have stopped modernising capitalisation (excepting editorial corrections as described above), editors should no longer use the shortcut entities for normalising the case of a letter, e.g. &a;, &A;, etc.

NB: Editors working on our earliest encoded texts may encounter deprecated code where <reg> had a @type attribute with a value of ‘modernisation’. It is vital that editors do not remove the @type attribute or change it from ‘modernisation’. That <reg type="modernisation"> allows the XSLT code the creates the online version to ignore this now-abortive attempt to modernise the capitalisation. That is, it will simply output ‘A’ when it encounters ‘<choice><orig>A</orig><reg type="modernisation">a</reg></choice>’. If you were to remove the @type attribute or change its value from ‘modernisation’, it would display an ‘A’ in the diplomatic view and an ‘a’ in the normalised, which is obviously what we don’t want. While it may be easiest for editors to simply ignore this coding (other than fixing typos), they could optionally utterly remove the entire <choice>, <orig>, <reg> code, provided it’s simply just a regularisation of the case of a letter for the our no longer needed hyper-modernised display, e.g. changing ‘<choice><orig>A</orig><reg type="modernisation">a</reg></choice>’ simply to ‘A’ (which is the letter as it appears in the original text). However, this seems like largely redundant work to me since it’s removing functional code that we aren’t using right now but which we (or someone else) might use in the more distant future after the end of this project.

Sometimes capitalisation is merely a representational artefact. For example, if we were transcribing the first word of a paragraph whose first letter was an ornamental drop capital followed by the rest of the word being capitalised at a normal size within the text, it would be coded <hi rend="dropCap">F</hi><hi rend="uppercase">irstly</hi>. Please pay special attention to the difference between uppercase ‘uppercase’ characters and characters in small caps ‘smallCaps’ (see below). If you are in any doubt as to whether something should have its case regularised, please create an XML TODO comment (e.g. ‘’ so we can discuss it as a team.

In manuscripts, it is often very hard to say whether or not an initial letter is capitalised, or indeed whether the scribe himself or herself would have been able to say whether it was. In such cases, the choice of upper or lower case is left to the editor’s judgment, which may be informed by the context: proper nouns, and words at the beginning of sentences, are more likely to be considered capitalised; conjunctions and prepositions not at the beginning of sentences are less likely to be considered capitalised – but it is impossible to give a hard and fast ruling on this.

4) Standard types of abbreviation and shorthand should be provided with regularisations using <orig> and <reg>.For example, ‘y^r’ (‘your’), ‘ꝑpare’ (‘prepare’), ‘y^e’ (‘the’) and ‘q;’ (‘que’ in printed Latin texts) would respecitvely be coded <choice><orig>y<hi rend="superscript">r</hi></orig><reg>your</reg></choice>’, ‘<choice><orig>&crossedp;</orig><reg>pre</reg></choice>pare’, ‘<choice><orig>þ<hi rend="superscript">e</hi></orig><reg>the</reg></choice>’ and ‘<choice><orig>q;</orig><reg>que</reg></choice>’.

a) In general, words should only be expanded if there is some form of brevigraph, overlining or other explicit scribal or representational indication that an abbreviation is intended (such as the use of superscript in ‘B^p’ for ‘Bishop’). A full stop after a truncated word with no other indicators of an abbreviation, however, should not be regarded as an abbreviation indicator. The use of full stops is so inconsistent and ambivalent that it would be rash to ascribe any semantic value to them.

A very common brevigraph is the overlining of a vowel or of the letters ‘m’, ‘n’ or ‘y’ to indicate a following ‘m’ or ‘n’. This can be rendered e.g. ‘mel<choice><orig>&aover;</orig><reg>an</reg></choice>colyk’: it is up to the transcriber to deduce whether the omitted letter is ‘m’ or ‘n’ (this is usually self-evident but occasionally ambiguous). Many of the more common vowel/macron combinations are defined in our in-house entity set. However, you will encounter situations where they aren’t defined, for example ‘Willm’ for ‘William’. When this happens code the overlined portion using <hi rend="overline">: ‘W<hi rend="overline">illm</hi>.

Our entity set covers a number of standard brevigraphs such as a q-followed-by-a-tail (‘’) to mean ‘que’ (&que;), q-followed-by-a-semicolon (‘q;’) which used in print to indicate ‘que’ (&que2;), a character that looks like a small superscripted 9 (‘ꝯ’) meaning either ‘us’ at the end of a word or ‘con’ at the beginning of one (&uscon;). If you should find yourself coding a document that often uses fairly consisent brevigraphs/abbreviations, such as ‘y^e’ for ‘the’ (coded ‘<choice><orig>þ<hi rend="superscript">e</hi></orig><reg>the</reg></choice>’), ‘y^t’ for ‘that’ (coded ‘<choice><orig>þ<hi rend="superscript">t</hi></orig><reg>that</reg></choice>’) or ‘w^ch’ for ‘which’ (coded ‘<choice><orig>w<hi rend="superscript">ch</hi></orig><reg>which</reg></choice>’), contact the Technical Director and he will create file-specific entity declarations for them so that you can just enter &the;, &that; or &which; for use in that file.

It is important to bear in mind that some of our custom entities represent fully expanded XML fragments, such as ‘&que;’ meaning ‘<choice><orig>q&tail;</orig><reg>que</reg></choice>’ or ‘&jVowel;’ meaning ‘<choice><orig>j</orig><reg>i</reg></choice>’, while others merely represent the glyph itself ‘&aover;’ (ā) and ‘&crossedp; (ꝑ)’. The full expansion of each entity is clearly spelled out in the in-house entity set.

b) There are, however, some conventional abbreviations not explicitly flagged as such that may occur with such frequency (and might look so suspiciously like transcriptional or typographical errors) that they seem worth regularising. Examples are:

’wth’ (rather than ‘w^th’) for ‘with’ (’<choice><orig>wth</orig><reg>with</reg></choice>’)
’wch’ (rather than ‘w^ch’) for ‘which’ (’<choice><orig>wch</orig><reg>which</reg></choice>’)

d) Abbreviations that are still standard (aside from the question of superscripting) can be left as they stand, e.g. ‘M^r’, ‘M^rs’, ‘D^r’, which can be coded as ‘M<hi rend="superscript">r</hi>’, ‘M<hi rend="superscript">rs</hi>’ and ‘D<hi rend="superscript">r</hi>’ respectively. ‘&c’ (for etcetera) can be left as it stands, coded ‘&c’ (and does not need to be flagged as <foreign> if it occurs in a passage in English).

5) The distinction between ‘normalisation’ and ‘correction’ is a very fine one, but if the transcriber/editor deems some part of the text to be an error on the author/scribe/printer’s part, the original text and the editorial amendment should be encoded, respectively, in <sic>/<corr> tags within <choice>. For instance, ‘<choice><sic>squncy</sic><corr>squincy</corr></choice>’. Note that, as in this example, the content of <corr> should be what (the transcriber thinks) the original author would have corrected his mistake to if he had noticed it, even if that still looks ‘wrong’ to a modern reader (in this case, a modernised version would read ‘squinsy’, meaning suppurative tonsillitis). The <corr> element has @cert and @resp values with which the encoder can (and should, if there is any doubt) record who proposed the correction and how sure he/she is about it, on a scale of ‘high’ (pretty confident)/‘medium’ (in two or three minds)/‘low’ (educated guess).

As in the above example, the <sic> and <corr> elements should contain whole words (or whole numerals), even if only one character requires correction.

In the event of the author/scribe inadvertently duplicating text, or including completely irrelevant text, the <corr> part of the <choice> element takes the @type value ‘noText’ and has no content, e.g. ‘Isabel … Carter of <choice><sic>of</sic><corr type="noText"/></choice> 36 yeres’. Where it appears that the author/scribe intended to delete text but failed to do so, an empty <corr> element with the @type value ‘delText’ can be used similarly. The borderline between ‘noText’ and ‘delText’ is often open to dispute and may call for discussion on a case-by-case basis.

Where the content of <corr> is a symbol or abbreviation that would otherwise be supplied with an expansion using <orig> and <reg>, it should not be expanded within <corr>. Instead, the entire <choice> string should nest within the <orig> element of a further <choice> string, with the regularisation of the corrected version appearing as the content of <reg>.

<sic> and <corr> can also be used to correct the absence of spaces between two words or the insertion of a spurious space within a single word, e.g. <choice><sic>twowords</sic><corr>two words</corr></choice> and <choice><sic>a gainst</sic><corr>against</corr></choice>

6) Punctuation will also be regularised when necessary for the sake of grammatical clarity using the same <orig> and <reg> modernisation coding. For example, if a list of items was introduced by a semicolon rather than colon, it would be coded as: <choice><orig>;</orig><reg>:</reg></choice>. If a sentence was missing a terminal period, it could be added using <choice><orig/><reg>.</reg></choice>. Similarly, we can use analogous coding to remove punctuation from the original that we do not want to see in the normalised view, <choice><orig>,</orig><reg/></choice>. The virgule, or slightly-wobbly-forward-slash character, which is clearly a punctuation mark of some sort but cannot be confidently equated with any modern punctuation mark, should be recorded as the entity &slash;. As with any other punctuation mark, it should follow the preceding text immediately, with no space in between (except in cases where a space clearly is intended, which it quite often is).

Content Tagging

1) Headings, whether of the whole document or of a section within it, should be tagged <head>. The @rend attribute is used to indicate the block formatting of the paragraph, such as the text alignment (‘centre’, ‘left’, ‘right’) and the indentation of the first line (‘indent0’, ‘indent5’, ‘indent10’). If @rend is omitted, it will be assumed that the heading was left justified and indented by about 3–5 spaces. If the text of the heading is larger (or smaller) than normal body text, this should be indicated using <hi> (see (6) below). <head> should only occur at the beginning of a <div> or <lg> (line group, i.e. verse passage). Things that look like headings but do not in fact introduce new sections should be tagged <ab type="head"> (if they occur outside other block elements (e.g. paragraphs (<p>) or as <seg type="head"> if the ‘head’ occurs inside a paragraph in the midst of a block of text (though I can’t imagine such a thing happening).

2) Paragraphs in prose should be tagged <p>. The @rend attribute is used to indicate the block formatting of the paragraph, such as the text alignment and the indentation of the first line. No @rend value for the indentation is needed if it’s ’normal’ (i.e. the first line of the paragraph is indented by about 3–5 spaces). Don’t bother recording slight variants in indentation, only ones pronounced enough to seem potentially significant, e.g. not indented at all or indented by about double or triple the normal amount (i.e. ten or fifteen spaces). There will usually be no need to use @rend to record the block formatting of the paragraph unless it is centred, flush right or has significantly increased left/right margins). When these happen consult the entry on <p> in the Element Set. Multiple tokens may be placed within a single @rend. If you find yourself needing formatting that isn’t explicitly supported, please contact Michael Hawkins.

3) When coding manuscript materials, editors should take care to record all the linebreaks as outlined in 3a-c below. When coding printed materials, however, it is only necessary to record linebreaks in two, very special, situations. First, you will need to record hyphenated linebreaks that occur right at the end of the page (see 3c on how to code hyphenated linebreaks). Second, you should record a linebreak when it appears to be there to achieve a specific formatting effect (see 3b below).

a) The beginning of any new line of prose should be marked <lb>. This includes the first line of a paragraph or heading: although <lb> is formally defined as ‘line break’ it is more helpful to think of it as ‘line beginning’. Where a line break occurs between words there should be a space after the first word but not before the second. Where it occurs mid-word or at the beginning of a paragraph or heading there should be no space either side of it.

b) If a word is hyphenated at a line break, <lb> takes the @rend value ‘hyphenated’ and the hyphen itself should not be otherwise entered unless it is ‘hard’ and would have appeared anyway. For example, you would not manually enter the hyphen when the word idolatrous was split across two lines and hyphenated in the original. It would be coded as: ‘… idol<lb rend="hyphenated"/>atrous superstitious …’. However, if the word ‘Idol-Temples’ was broken across two lines just before Temples, the hyphen would be manually entered since it is a hard hyphen that would have occurred within the word even if it wasn’t split across two lines. You also would not supply a @type on <lb>; it would be coded as ‘Idol-<lb/>Temples’.

c) Line breaks that appear to be there for a reason but do not represent the beginning of a new paragraph (e.g. to split up a heading) should be tagged <lb break="yes"/>. These are the only ones that will normally be rendered in the normalised display.

4) Any text string in verse (even if it is only one line long) should be tagged <lg> (line group) and each individual line (or incomplete line) within it tagged <l>.


        <lg>
            <head rend="centre"><hi rend="bold">1</hi></head>
            <l>NO Ladies loves, nor Knights brave martiall deeds,</l>
            <l>Ywrapt in rolls of hid Antiquitie;</l>
            <l>But th' inward Fountain, and the unseen Seeds,</l>
            <l>From whence are these and what so under eye</l>
            <l>Doth fall, or is record in memorie,</l>
            <l>Psyche, I'll sing PfycheSingle illegible letter from thee they sprong.</l>
            <l>O life of Time, and all Alterity!</l>
            <l>The life of lives instill his nSingle illegible letterctar strong,</l>
            <l>My soul t' inebriate, while I sing Psyches song.</l>
        </lg>
        <lg>
            <head rend="centre"><hi rend="bold">2</hi></head>
            <l>But thou, who e're thou art that hear'st this strain,</l>
            <l>Or read'st these rythmes which from Platonick rage</l>
            <l>Do powerfully flow forth, dare not to blame</l>
            <l>My forward pen of foul miscarriage;</l>
            <l>If all that's spoke, with thoughts more sadly sage</l>
            <l>Doth not agree. My task is not to try</l>
            <l>What's simply true. I onely do engage</l>
            <l>My self to make a fit discovery,</l>
            <l>Give some fair glimpse of Plato's hid Philosophy.</l>
        </lg>

NB: <head> is an optional element and should only be used to encode stanza headers if they appear in the original text. Most stanzas will likely not have <head>. More’s Philosophical Poems is somewhat atypical in this regard.

5) Whitespace, i.e. space left blank between lines or within a line of text, should be encoded as <space>. The @dim (dimension) value (‘horizontal’ or ‘vertical’) states exactly what it says it does; the @extent value is a numerical character indicating the extent of the space, and the @unit value states what sort of units it is being measured in (normally ‘chars’, i.e. characters, for horizontal whitespace or ‘lines’ for vertical whitespace). It is clearly impossible to be precise about the @extent value: a reasonable approximation of how many characters or lines would have fitted into the space is sufficient.

6) Text that has been distinctively rendered in some way, e.g. underlined, italicised, ^{superscripted}, etc., will be coded using a number of different elements based on the meaning/significance of the distinctively formatted text. If the character formatting was purely representational, you should use <hi>. However, if the specially formatted text was denoting a quote, foreign text (relative to the main document language), a person or place name or a specalised technical term, you should use <q>, <foreign>, <persName>, <placeName> or <rs> (respectively). The actual formatting itself would be described within the @rend attribute of whatever element you were using, e.g. ‘this is <hi rend="italic">italic</hi>’ or It tends to make men be thought as mere machines, as <persName rend="italic">Des Cartes</persName> imagined beasts to be’. Valid values for @rend include:

‘underline’
‘doubleUnderline’
‘overline’
‘italic’
‘bold’
‘superscript’
‘subscript’
‘roman’
‘large’
‘larger’
‘largest’
‘small’
‘smaller’
‘smallest’
‘dropCap’ (a dropCap letter)
‘smallCaps’ (text in Small capitals)
‘uppercase’ (text in uppercase)

Generally speaking, we do not attempt to record all the changes in size of a scribe’s handwriting unless these fluctuations seem significant (either interpretatively or aesthetically).

Pay careful attention to the difference between text in small caps and uppercase. To the typographically initiated, they might appear to be the same thing, but they aren’t. Small caps are smaller capital letters whose height is roughly the height of lowercase letters (strictly speaking, a lowercase x). The initial character of a small caps word is often the same size as a normal capital. The characters in uppercase text, in contrast, are all the size of a normal capitalised character. Compare the following: small caps, Small caps and uppercase.

To encode small caps, use <hi rend="smallCaps"> and pay careful attention to the capitalisation of the text that you enter. The smaller capitals should be entered simply as lowercase characters and the normal sized captials should be entered in their capitalised forms. For example:

This string is in Small Caps (note the normal sized ‘T’, ‘S’ and ‘C’) would be encoded <hi rend="smallCaps">This string is in Small Caps</hi>
all smaller caps would be encoded <hi rend="smallCaps">all smaller caps</hi>

When a word is simply capitalised (like ‘Aristotle’ or is an abbreviation with normal height capitals (S.P.Q.R), just enter the characters manually, i.e. ‘Aristotle’ and ‘S.P.Q.R’. Only use <hi rend="uppercase"> when the word (or significant part of a word) is entered in normal sized capital letters and where we would want to represent it differently within the normalised view of the text. This tag will consequently be rather rare and I suspect it will chiefly appear at the start of paragraphs, often following an ornamental drop capital, e.g.

ARISTOTLE argued Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce lacinia velit eget nisi fermentum, id fringilla urna congue. Proin molestie nisi non nisl maximus sollicitudin. Fusce vestibulum faucibus lorem, non fringilla ante accumsan ut. Fusce elementum rutrum viverra. Phasellus posuere, ligula nec tristique ornare, enim quam pharetra nisl, quis venenatis dolor metus in massa. Donec nec felis sed risus ultrices suscipit. Nullam lacinia tincidunt justo nec pellentesque. Nulla bibendum, justo quis blandit feugiat, nisi nisi pulvinar sem, nec dignissim velit enim eleifend lacus. Suspendisse blandit mattis magna quis consequat. Nam vel ullamcorper felis.

<p><hi rend="dropCap">A</hi><hi rend="uppercase">ristotle</hi> said ...</p>

ARISTOTLE said ...

<p><hi rend="uppercase">Aristotle</hi> said ...</p>

When entering the text within <hi rend="uppercase"> be sure to manually normalise the capitalisation so that it will appear properly when output in the normalised view. That is, manually fix the capitalisation as it was done in the above examples.

7) All quoted material, regardless of whether or not it was represented in a visually distinctive character style, should be coded using <q>. If the quote is rendered in a distinctive format (i.e. italicised), this should be recorded in its @rend attribute. The values allowed in @rend include those described above in <hi>, such as ‘italic’ or ‘underline’, as well as ‘quoteLeft’ (repeating quote marks at the start of each line of a multi-line quote, either in the left margin or at the start of the line itself), ‘quoteRight’ (repeating quote marks at the end of each line of a multi-line quote, either in the right margin or at the end of the line itself) and ‘none’ (no distinct formatting of the text). You can enter multiple values into @rend so if a quote is surrounded by quotation marks and styled in italic, it would be coded as ‘<q rend="quotes italic">Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor.</q>’. If a multi-line quote has repeating quotes on both the left and right of each line, record this as <q rend="quoteLeft quoteRight">. If a quote is only one line long and the quotes are not obviously in the left or right margin, manually enter the quote marks and then code the quote with an appropriate @rend value indicating the character styling (‘italic’, ‘none’, etc), e.g. ‘Lorem ipsum dolor sit amet, "<q rend="italic">consectetuer adipiscing elit</q>". Aenean commodo ligula eget dolor.’ (if the quote is italicised) or ‘Lorem ipsum dolor sit amet, "<q rend="none">consectetuer adipiscing elit</q>". Aenean commodo ligula eget dolor.’ (if the quote isn’t distinctively formatted).

8) All Words or passages in a language different from that of the surrounding text must be tagged <foreign> with @xml:lang values that correspond to a <language> defined in the <langUsage> section of the <teiHeader>. For example, if Latin text (without any distinctive character styling) appeared within an English passage, it would be coded as ‘this is in English but it’s now switching to Latin, <foreign xml:lang="lat">Lorem ipsum dolor sit amet, consectetur adipiscing elit</foreign>.’. If the foreign text was rendered in a distinctive manner from the surrounding text, say it is in italics, this should be indicated using @rend and the values outlined above. If foreign language passages violate element boundaries, use successive nested <foreign> tags within each block.

<foreign> may nest within <foreign>, so if, for instance, there’s a bit of Greek in the middle of a Latin passage in a document whose main language is English, it is tagged thus: ‘main English text of document <foreign xml:lang="lat" rend="italic">Latin interpolation with a <foreign xml:lang="gre" rend="roman">bit of Greek</foreign> in the middle of it</foreign> resumption of English text’

9) All person and place names should be coded respectively using <persName> or <placeName> regardless of whether they are formatted in a distinctive manner from the surrounding text. However, if the names are rendered in a distictive character style (e.g. italic), this should be indicated in @rend using the usual character styling values, e.g. ‘It tends to make men be thought as mere machines, as <persName rend="italic">Des Cartes</persName> imagined beasts to be’.

10) At this time, specialised technical/philosophical terms that are rendered in a distinctive character style from the surrounding text should be encoded using <rs>, which takes a mandatory @type describing the type of term. At present, the only value is ‘term’ but we expect a more finely grained typology to be developed as the project progresses. As with the previous elements, any specialised character styling present in the original document should be indicated using @rend.

Editors will likely encounter formatted text that falls into a distinctive category that we haven’t anticipated. When this happens, please use <hi> with an appropriate @rend value and a @resp that assigns responsibility for this coding to you (this will make it easier to find these items later and review how they might be better coded). Once this has been done, create an issue in Freedcamp so the team can discuss the best way forwards.

11) Page breaks (including those at the beginning of a document) should be indicated by <pb>. Each <pb> requires an @xml:id value that must be unique within the document. It also requires an @n value representing the page or, if unpaginated, signature number. This is what will appear as the page number when viewing the text online. Unlike @xml:id, @n does not need to be unique. If <pb> is preceded and/or followed by <fw>, there should be no gap between these tags. If a word is split by a page break, the <pb> and any associated <fw> and <lb> should sit directly in the middle of the word, with no space either side of the elements. Otherwise there should be one space either side.

12) For manuscript sources, Catchwords, page numbers, running headers, shelfmarks and sigils are dealt with using <fw> (‘forme work’). Page numbers in manuscripts or early printed texts should appear immediately after the page break (<pb/>) in the transcript, irrespective of where they actually appear on the page (this is indicated by the place value).

No space should be left between <fw> and <pb>: for instance ‘... without noting any various lections in <fw type="catchword" place="bottomRight">them</fw><pb xml:id="p022r" n="22r"/><fw type="page" place="topRight">22</fw> them ...’.

If a catchword is incomplete, no space should be left between the component parts of the word. Otherwise, leave one space either side of the <fw/><pb/><fw/> sequence. For instance:

I <fw type="catchword">under</fw><pb xml:id="p17" n="17r"/><fw type="page">17</fw>understand

I under<fw type="catchword">stand</fw><pb xml:id="p17" n="17r"/><fw type="page">17</fw>stand

I <fw type="catchword">understand</fw><pb xml:id="p17" n="17r"/><fw type="page">17</fw> understand

If a catchword is incomplete and it and/or the preceding word has a hyphen, indicate this with <lb rend="hyphenated"/>. For instance:

I <fw type="catchword">under<lb rend="hyphenated"/></fw><pb xml:id="p17" n="17r"/><fw type="page">17</fw>understand

I under<lb rend="hyphenated"/><fw type="catchword">stand</fw><pb xml:id="p17" n="17r"/><fw type="page">17</fw>stand

Be aware of the distinction between page break (see section 7 above) and page number. Page break (<pb>) indicates the physical point at which the text moves on to a new page; page number (<fw type="page">) encodes page numbers that actually feature in the document (whether they were put there by the original writer, compositor or anyone else, and whether or not they correspond to the number assigned to that page by the transcriber/encoder).

If a page is mispaginated (say p. 29 was mispaginated as p. 92), it should be dealt with in the following manner: ‘<pb xml:id="p29" n="29"/><fw type="page" place="topRight"><choice><sic>92</sic><corr>29</corr></choice></fw>’. You’ll note that the original value and its correction are indicated using <sic> and <corr> in the <fw> whereas the <pb> @xml:id and @n attribute values are silently corrected to the correct value.

13) Column breaks (including the beginning of the first column on the page) should be indicated by <cb>, which requires a unique @xml:id value and a @n value, which is not necessarily unique, and is what will actually appear in the transformed file on the site. The @xml:id normally takes the value of the page number followed by ‘-colA’ (for column 1 on the page), ‘-colB’ (for column 2 on the page) and ‘-colC’ (for column 3 on the page), etc. The @n takes ‘a’, ‘b’, ‘c’, etc. value from the @xml:id. Therefore, if page 62r were divided into two columns, the first column would be coded <cb xml:id="p62-colA" n="a"/> and the second column would be coded <cb xml:id="p62-colb" n="b"/>.The rules about spacing around <cb>, which are the same as those around <pb>.

14) Added text is tagged <add>, with a @place value of ‘supralinear’ (above the line), ‘infralinear’ (below the line), ‘inline’ (neither higher nor lower than the surrounding text but obviously added later), ‘interlinear’ (added text that itself runs to more than one line), ‘over’ (physically overwriting an earlier text string), ‘marginRight’ (in the right margin), ‘marginLeft’ (in the left margin), ‘pageTop’ (at the top of the page), ‘pageBottom’ (at the bottom of the page), ‘topLeft’, ‘topRight’, ‘bottomLeft’ or ‘bottomRight’ (top left, top right, bottom left and bottom right respectively). Further values may be added into the schema if these prove inadequate.

<add> may nest in <add>, in which case the @place value of the nested <add> refers to where it appears relative to the inserted text it nests in. For instance, an infralinear insertion contained within a supralinear insertion may still be above the line of the main text into which the supralinear insertion is inserted, but nonetheless takes the @place value ‘infralinear’.

When the text contains a visual mark indicating the placement of an addition (say a caret below the line) or indeed any other editorial artefact (like a line down the left hand margin indicating a deletion), then this should be coded using <metamark>, which contains three mandatory attributes, @rend, @place and @function, which describe the visual appearance and function of the mark.

@rend is where the editor should describe the appearance of the metamark. At present only two values are allowed: ‘caret’ (^) and ‘line’ (|). However when we encounter more of these marks, we can come up with additional names as required.

@place is used to describe the placement of the metamark within the text and it takes the same values as described above in <add>.

@function is used to describe the function of the mark within the text and its permitted values are ‘insertion’, ‘deletion’, ‘transposition’ (i.e. two characters or words have their order changed within the text by means of line or loop-like mark) and ‘unknown’ (the editor is unable to ascertain the precise function of this mark).

<metamark> has one attribute that is usually, but not always, necessary: @target. While not mandatory, @target is one of the more useful attributes since it allows the editor to connect the <metamark> to the text that it affects. For example, if a document contained a caret indicating the placement of an addition, <metamark>’s @target attribute would point to that <add>’s @xml:id should be given an @xml:id value. This attribute is marked as recommended since it’s likely that editors will encounter a metamark within a text that do not appear to be connected to any other piece of text. For example, an author could have written a caret to indicate a possible insertion, but then decided against it (or forgot). Often this <metamark> would be deleted (in which case it should be nested within <del>) but sometimes it might not be.

…have no power over themselfs to stop, or excite <metamark rend="caret" place="infralinear" function="insertion" target="#meta-add0001"/><add place="supralinear" xml:id="meta-add0001">their own Power</add> retard or accelerate theire owne force…

15) Insertions from elsewhere in the text (another page, or a different part of the same page), or insertions that violate element boundaries, should be transcribed where they belong in the text, introduced by <addSpan> (with mandatory @spanTo and @place) and terminated by an associated <anchor> (with a mandatory @xml:id). Where such insertions appear on the main body of a page, line breaks should be tagged. The physical location of the passage is indicated by the @place value of <addSpan>. In order to connect the <addSpan> to the <anchor> marking its end it is necessary for you to assign the <anchor> a unique @xml:id value and then have the <addSpan> point to it by entering that value (preceded by a ‘#’) in @spanTo, e.g. ‘<addSpan spanTo="#addend001r-01" place="p001v"/>…<anchor xml:id="addend001r-01"/>’.

Even if the added section begins and/or ends on a different page from the main text, it should not be introduced or terminated by <pb> (the function of that tag has been taken over by the <addSpan> and <anchor> tags), but if the inserted passage itself runs to more than one page, code the page breaks within it using <pb> as normal. For instance: <pb xml:id="p004r" n="4r"/> … Afric and Britain being quieted a little before. <addSpan spanTo="#addend003v-01" place="p002v p003v"/>For the history of the wars … you may see in Iornandes mention made of an incursion of the Vandals out of Pannonia into Gallia: which Vandals, as <pb xml:id="p003v" n="3v"/> the same Iornandes relates, had been received into Pannonia by Constantine … the wars in Italy AD 536.<anchor xml:id="addend003v-01"/> The first Trumpet begins with the Visigothic wars ….

This indicates that the text before the inserted section is on f. 4r, the inserted text itself begins on f. 2v and continues on f. 3v from ‘the same Iornandes’ to ‘the wars in Italy AD 536.’ where the text from f. 4r resumes with ‘The first Trumpet begins’.

If this results in two or more <pb>s having the same @xml:id value (this is fairly unusual but it does happen), call the first one (for instance) <pb xml:id="p034v-a" n="34v">, the second <pb xml:id="p034v-b" n="34v">, and so on. It doesn’t matter that the two @n values are identical, but all @xml:id values must be unique (qua @xml:id values) within a document.

Authors may sometimes indicate the location of such inserted passages by beginning them with a symbol such as an obelus or a dot in a circle, and placing the same symbol in the main text at the point s/he wants that insertion to appear. These symbols should be recorded in the transcription, using entities (as per the entity list). For example, ‘<pb xml:id="p014r" n="14r"/> … by degrees they subdued it. &obelus;<addSpan spanTo="#addend014v-01" place="p013v p014v"/> &obelus; The calamity of Afric in the first two or three years of this invasion …’. If a particular glyph isn’t supported, please contact Michael Hawkins.

It can sometimes be very difficult to decide where exactly a supplementary passage belongs when the author fails to put in a linking glyph, or, indeed, whether the author was entirely sure where they wanted it to go. When in doubt about the placement of an <addSpan>, add a  comment, e.g. . Please make sure that you transcribe all the alloted text in your document, even if you have no idea where some of it belongs — again this can be pointed out in a  comnment, e.g. . One option in such cases, to avoid interrupting the flow of your main text, is to code these ‘orphaned’ passage(s) as <addSpan>s at the very end of your transcription with the physical location of each of these ‘orphaned’ passages noted in their @place value. It will then be the responsibility of a senior editors to decide where to place it in the final version.

The distinction between <addSpan> (supplementary text) and <note> (annotation) can be very difficult to ascertain for some authors. If in doubt, say so in a  comment.

16) Deleted text is tagged <del>, with a mandatory @type that can take one of the following values: ‘blockStrikethrough’ (for whole sections struck through en bloc), ‘strikethrough’ (for a text string crossed out by a continuous horizontal line), ‘cancelled’ (for any heavier deletion), ‘erased’ (for text that has been rubbed or scraped away from the original document), or ‘over’ (for cases where one text string overwrites another, functioning simultaneously as deletion and replacement). Text tagged <del type="over"> will always, by definition, be followed immediately by text tagged <add place="over">.

Where added text replaces deleted text, the two strings should nest in a <subst> element and the deleted text should be transcribed first. This applies even in cases where the caret mark or other insertion indicator appears, physically, before the <del>. If the added text has the @place value ‘over’ and/or it replaces a text string that is only part of a word, number or other textual unit, it should follow the deleted text with no space between the two elements. Otherwise, one space should be left between the <del> and <add> elements.

Except in the case of overwriting, it is not always obvious whether an addition does replace a deletion, as opposed to just happening to occur at the same point. <subst> should only be used if the transcriber/editor is reasonably confident that it really does represent a substitution.

If <add> and <del> are co-extensive — i.e. the added text has been deleted in its entirety but the surrounding text is undeleted — <del> should nest directly within <add> rather than vice versa. However, <add> may nest within <del> if the insertion represents part of a longer text string that was subsequently deleted in its entirety.

<del> may also nest within <del> if it appears that some part of a text string had been deleted before the longer text string was.

18) The hand or hands in which the document is written are recorded in the <handNotes> section of the <teiHeader>. If more than one hand features, there are two ways of distinguishing them in the body text:

a) If the main text has been written in one hand and subsequently altered by another hand, the identity of the second hand can be noted by the @hand attribute applied to the tags that record its interventions (chiefly <add> and <del>). For instance, supposing A (the principal scribe) wrote ‘the cat sat on the mat’ and B changed ‘cat’ to ‘dog’, and they are given the @scribe values ‘scribeA’ and ‘scribeB’ respectively in the <handNotes> section, this would be tagged ‘the <subst><del type="strikethrough" hand="#scribeB">cat</del> <add place="supralinear" hand="#scribeB">dog</add></subst> sat on the mat’ (i.e. the deletion — though not the deleted word itself — and the addition are both in hand B).

b) If the main text simply changes from one hand to another at some point, this can be marked with the empty element <handShift>, placed immediately before the first character in the new hand, with the @new value linking to the code for the new hand, e.g. ‘the cat <handShift new="#scribeB"/>sat on the mat’ indicates that A wrote the text up to ‘the cat’ and then B took over.

19) When transcribing marginal and other notes in print sources, they are simply tagged <note> and transcribed at the point in the text to which they refer. If a note indicator (such as an obelus, asterisk or superscript character) is present, this should be recorded as the @n value of <note>, using an entity if necessary. If there is any doubt about which point in the text a note refers to — and/or whether a given portion of text counts as a note or not — this should be mentioned in a comment tag. The physical location of the note can be recorded in the @place value of <note> as ‘infralinear’, ‘inline’, ‘interlinear’, ‘lineBeginning’, ‘lineEnd’, ‘marginLeft’, ‘marginRight’, ‘pageTop’, ‘pageBottom’. ‘marginLeft’ and ‘marginRight’ should be seen as relative to the point they relate to: for instance, notes that actually occur in the middle of the page as a whole should be considered ‘marginRight’ if they pertain to a point in the left column or ‘marginLeft’ if they pertain to a point in the right column. These values are currently constrained but can be expanded if it proves necessary.

If a note runs to more than one page, it may contain <pb> and <fw> as appropriate. If this results in two or more <pb>s having the same xml:id value, and it most cases it will, modify the <pb>’s @xml:id value in the note by adding ‘-a’, ‘-b’ etc., as described in Insertions (see section 15 above). It doesn’t matter that the two @n values are identical so you should not add the ‘-a’, ‘-b’ to it. However, it is essential that you do this to any relevant @xml:id values since each value must be unique within a document.

The instructions for transcribing notes in manuscript works are slightly different from those in print. When transcribing a manuscript work, it’s necessary to precede the <note> with an <anchor>. The <anchor> contains a mandatory @xml:id which the <note> points to by means of its @target. This extra coding is necessary because it’s not unusual for a single glyph in a manuscript to point to more than one footnote or for a single <note> to be pointed two in multiple places within the text.

… erat a volutpat aliquet.<note n="a" place="marginLeft"><hi rend="superscript">a</hi> Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam mattis quis felis non lobortis. Integer efficitur purus <pb xml:id="p029-a" n="29"/> ut neque tempor, a porta est finibus. Mauris aliquet purus sapien, at tincidunt dui convallis ac. Nam ac posuere erat. Pellentesque ipsum leo, varius quis nibh a, malesuada lacinia ex. </note> Integer gravida non quam ac effic<lb type="hyphenated"/><fw type="catchword" place="bottomRight">itur</fw><pb xml:id="p029-a" n="29"/><fw type="page" place="topRight">29</fw>itur. Fusce nec sem …

If the text includes its own indicator of the point to which an annotation refers, such as a symbol or a superscript letter, place the <anchor> immediately after the indicator. As with <addSpan> passages (see section 11 above), these indicators should be included in the transcription, not edited out, e.g. ‘… I took the most free and natural application <lb/>of it to phenomena to be this<hi rend="superscript">&SunSymbol;</hi><anchor xml:id="n001r-01"/><note n="&SunSymbol;" place="marginLeft" target="#n001r-01"><hi rend="superscript">i</hi> Transact. n<hi rend="superscript">o</hi> 88. p. 5088.</note>: that the agitated parts of bodies, according <lb/>to their several sizes, figure, and motions …’. If there is no such indicator, put the <anchor> at what seems the most appropriate point, leaving no space between it and the point it refers to and omit @n on <note>, e.g.‘as Augustine<anchor xml:id="n006r- 03"/><note place="marginRight" target="#n006r-03">De Civ. Dei l. 8 c. 4</note> saith …’

20) Some of our excerpts will be taken from passages within works that have been fully transcribed. When this happens, the start and end of the excerpts should be tagged with <anchor> elements. The one at the start of the excerpt would be coded as <anchor xml:id="excerpt001Start" next="#excerpt001End" type="excerptMarker" /> and the one at the end would be coded <anchor xml:id="excerpt001End" prev="#excerpt001Start" type="excerptMarker" />. The @next attribute in the starting <anchor> points to the @xml:id of the ending <anchor>, which itself points back to the first one using the @prev attribute. If a document contains multiple exercepts, it will be necessary for you to alter the @xml:id, @next and @prev values for each <anchor>. That is, the anchors of the first excerpt would have start and end values of ‘excerpt001Start’ and ‘excerpt001End’, the anchors for the second would have start and end values of ‘excerpt002Start’ and ‘excerpt002End’, etc. Naturally, you would also updated the @next and @prev values so that start of excerpt two points to the end of excerpt two in @next, e.g. <anchor xml:id="excerpt002Start" next="#excerpt002End" type="excerptMarker" /> and the end of excerpt two points to the start of excerpt two in @prev, e.g. <anchor xml:id="excerpt002End" next="#excerpt002Start" type="excerptMarker" />.

21) Unclear, Illegible or Omitted Text

a) Uncertain or conjectural readings should be tagged <unclear>, with a @cert value indicating the degree of certainty about the reading on a scale of ‘high’ (pretty confident), ‘medium’ (doubtful) or ‘low’ (an educated guess). The reason why the text is unclear is expressed by the @reason value, which may be any of the following:

‘binding’: text has been rendered unclear by over-zealous binding
‘bleedthrough’: text has been rendered unclear by ink bleeding through from the other side of the folio
‘blot’: the text has been rendered unclear by a blot of ink on the page that does not seem to be a deliberate deletion
‘blotDel’: something that could be either an accidental blot or a deliberate deletion
‘copy’: poor quality of the copy (e.g. microfilm, microfiche, jpg, photocopy) being used. By the end of the project these should, ideally, all have been checked against the original, but this is a useful way of keeping track of what needs special attention
‘damage’: the MS is damaged in some way. Where there is significant damage to a given manuscript or page, the exact nature of the damage can be described in the <notesStmt> section of the <teiHeader>
‘del’: deleted
‘faded’
‘foxed’
‘over’: text is hard to read because it overwrites other text: NB not because it has itself been overwritten, which counts as ‘del’
‘hand’: lousy handwriting. This is the default option if none of the others applies and the source item is a manuscript.
‘type’: lousy type. This is the default option if none of the others applies and the source item is a printed work.

<unclear> may contain any quantity of text, from a single letter within a word to a number of whole words (unless it violates element boundaries, which is extremely rare: if it does, it will have to be presented as two or more consecutive <unclear>s).

b) Any text that is missing entirely from the surviving manuscript (normally through damage), or is wholly illegible for whatever reason, and cannot be even conjecturally supplied, should be tagged <gap>, with @reason values as for <unclear>, a numerical @extent value, and a @unit value of ‘chars’, ‘words’ or ‘lines’ (always plural, even if the @extent value is ‘1’). The @extent value does not need to be too precise: it is obviously impossible to tell exactly how much text has disappeared under a large blot, but useful to give a general idea of the scale of the omission. Alternatively, if a reasonably accurate guess is impossible (e.g. the bottom half of a page has been torn off so the loss could be anywhere between zero and three hundred words), @extent takes the value ‘unclear’ and no @unit value is needed.

If it is unclear whether text is missing or not (typically in cases of damage or binding), <gap> can also take an optional @cert value of ‘high’, ‘medium’ or ‘low’.

c) Material that is missing or illegible but can be supplied, if only conjecturally, should be tagged <supplied>, with @reason and @cert values as for <unclear>, except that if there is no reasonable doubt as to the content, no @cert value is needed. If the only apparent reason for an omission is authorial or scribal absent-mindedness, it should instead be rectified using <sic>/<corr>: ‘the killing <choice><sic type="noText"/><corr>of</corr></choice> the witnesses’.

22) When proofing XML texts created by other projects, you might encounter elements that haven’t been described in our documentation. When you encounter such elements, please first check whether they are included in the list below and, if not, please contact Michael Hawkins for guidance. Then make sure that the textual contents of those elements seems to be accurately transcribed from the original text. At present the undocumented elements include:

<opener>
<salute>
<closer>

Special Characters

European Characters

Greek

Hebrew Characters

Punctuation, Currency and Syntactic Characters

Units of Measure

Brevigraphs

Misc Glyphs

Alchemical/Astrological/Medical Characters

Mathematical Characters

European Characters

Á -> Á <-

á -> á <-

Â -> Â <-

â -> â <-

&aeacute; -> ǽ <-

Æ -> Æ <-

æ -> æ <-

&aeligacute; -> ǽ <-

À -> À <-

à -> à <-

&amacron; -> ā <-

&aover; -> ā <-

Å -> Å <-

å -> å <-

Ã -> Ã <-

ã -> ã <-

Ä -> Ä <-

ä -> ä <-

Ç -> Ç <-

ç -> ç <-

° -> ° <-

&dcross; -> đ <-

É -> É <-

é -> é <-

Ê -> Ê <-

ê -> ê <-

È -> È <-

è -> è <-

&emacron; -> ē <-

&eover; -> ē <-

&Eth; -> Ð <-

ð -> ð <-

Ë -> Ë <-

ë -> ë <-

&ff; -> F <-

Í -> Í <-

í -> í <-

Î -> Î <-

î -> î <-

Ì -> Ì <-

ì -> ì <-

&imacron; -> ī <-

&iover; -> ī <-

Ï -> Ï <-

ï -> ï <-

&mmacron; -> m̄ <-

&mover; -> m̄ <-

&nmacron; -> n̄ <-

&nover; -> n̄ <-

Ñ -> Ñ <-

ñ -> ñ <-

Ó -> Ó <-

ó -> ó <-

Ô -> Ô <-

ô -> ô <-

&OElig; -> Œ <-

&oelig; -> œ <-

Ò -> Ò <-

ò -> ò <-

&omacron; -> ō <-

&oover; -> ō <-

Ø -> Ø <-

ø -> ø <-

Õ -> Õ <-

õ -> õ <-

Ö -> Ö <-

ö -> ö <-

ß -> ß <-

&Thorn; -> Þ <-

þ -> y <-

Ú -> Ú <-

ú -> ú <-

Û -> Û <-

û -> û <-

Ù -> Ù <-

ù -> ù <-

&umacron; -> ū <-

&uover; -> ū <-

Ü -> Ü <-

ü -> ü <-

&wmacron; -> w̄ <-

&wover; -> ū <-

Ý -> Ý <-

ý -> ý <-

&ymacron; -> ȳ <-

&yover; -> ȳ <-

ÿ -> ÿ <-

Greek

Α -> Α <-

&Alphaacute; -> Ά <-

&Alphacbmacute; -> Ἄ <-

&Alphagrave; -> Ὰ <-

&Alphacbm; -> Ἀ <-

&Alphaobm; -> Ἁ <-

α -> α <-

&alphacbmacute; -> ἄ <-

&alphaacute; -> ά <-

&alphagrave; -> ὰ <-

&alphacbm; -> ἀ <-

&alphacbmtilde; -> ἆ <-

&alphacbmacute; -> ἄ <-

&alphacbmgrave; -> ἂ <-

&alphaiotasubscrpt; -> ᾳ <-

&alphaobm; -> ἁ <-

&alphaobmacute; -> ἅ <-

&alphaobmgrave; -> ἃ <-

&alphaobmiotasubscrpt; -> ᾁ <-

&alphaobmtilde; -> ἇ <-

&alphatilde; -> ᾶ <-

&alphatildeiotasubscrpt; -> ᾷ <-

Β -> Β <-

β -> β <-

Γ -> Γ <-

γ -> γ <-

Δ -> Δ <-

δ -> δ <-

Ε -> Ε <-

&Epsilonacute; -> Έ <-

&Epsilongrave; -> Ὲ <-

&Epsiloncbm; -> Ἐ <-

&Epsiloncbmacute; -> Ἔ <-

&Epsilonobm; -> Ἑ <-

&Epsilonobmacute; -> Ἕ <-

ε -> ε <-

&epsilonacute; -> έ <-

&epsilongrave; -> ὲ <-

&epsiloncbm; -> ἐ <-

&epsiloncbmacute; -> ἔ <-

&epsilonobm; -> ἑ <-

&epsilonobmacute; -> ἕ <-

&epsilonobmgrave; -> ἓ <-

Ζ -> Ζ <-

ζ -> ζ <-

Η -> Η <-

&Etaacute; -> Ή <-

&Etagrave; -> Ὴ <-

&Etacbm; -> Ἠ <-

&Etacbmacute; -> Ἤ <-

&Etaobm; -> Ἡ <-

&Etaobmacute; -> Ἥ <-

η -> η <-

&etaacute; -> ή <-

&etagrave; -> ὴ <-

&etacbm; -> ἠ <-

&etacbmacute; -> ἤ <-

&etacbmgrave; -> ἢ <-

&etacbmtilde; -> ἦ <-

&etaiotasubscrpt; -> ῃ <-

&etaobm; -> ἡ <-

&etaobmacute; -> ἥ <-

&etaobmgrave; -> ἣ <-

&etaobmtilde; -> ἧ <-

&etasubiota; -> ῃ <-

&etatilde; -> ῆ <-

&etatildeiotasubscrpt; -> ῇ <-

Θ -> Θ <-

θ -> θ <-

Ι -> Ι <-

&Iotaacute; -> Ί <-

&Iotagrave; -> Ὶ <-

&Iotacbm; -> Ἰ <-

&Iotacbmacute; -> Ἴ <-

&Iotaobm; -> Ἱ <-

&Iotaobmacute; -> Ἵ <-

ι -> ι <-

&iotaacute; -> ί <-

&iotagrave; -> ὶ <-

&iotacbm; -> ἰ <-

&iotacbmacute; -> ἴ <-

&iotacbmgrave; -> ἲ <-

&iotacbmtilde; -> ἶ <-

&iotaobm; -> ἱ <-

&iotaobmacute; -> ἵ <-

&iotaobmtilde; -> ἷ <-

&iotatilde; -> ῖ <-

&iotauml; -> ϊ <-

Κ -> Κ <-

κ -> κ <-

Λ -> Λ <-

λ -> λ <-

Μ -> Μ <-

μ -> μ <-

Ν -> Ν <-

ν -> ν <-

Ξ -> Ξ <-

ξ -> ξ <-

Ο -> Ο <-

&Omicronacute; -> Ό <-

&Omicrongrave; -> Ὸ <-

&Omicroncbm; -> Ὀ <-

&Omicroncbmacute; -> Ὄ <-

&Omicronobm; -> Ὁ <-

&Omicronobmacute; -> Ὅ <-

&Omicronobmgrave; -> Ὃ <-

ο -> ο <-

&omicronacute; -> ό <-

&omicrongrave; -> ὸ <-

&omicroncbm; -> ὀ <-

&omicroncbmacute; -> ὄ <-

&omicronobm; -> ὁ <-

&omicronobmacute; -> ὅ <-

&omicronobmgrave; -> ὃ <-

&omicrontilde; -> ο <-

Π -> Π <-

π -> π <-

Ρ -> Ρ <-

ρ -> ρ <-

Σ -> Σ <-

σ -> σ <-

&Endsigma; -> Σ <-

&endsigma; -> ς <-

Τ -> Τ <-

τ -> τ <-

Υ -> Υ <-

&Upsilonacute; -> Ύ <-

&Upsilongrave; -> Ὺ <-

&Upsilonobm; -> Ὑ <-

υ -> υ <-

&upsilonacute; -> ύ <-

&upsilongrave; -> ὺ <-

&upsiloncbm; -> ὐ <-

&upsiloncbmacute; -> ὔ <-

&upsiloncbmtilde; -> ὖ <-

&upsilonobm; -> ὑ <-

&upsilonobmacute; -> ὕ <-

&upsilonobmtilde; -> ὗ <-

&upsilontilde; -> ῦ <-

&upsilonumlaut; -> ϋ <-

Φ -> Φ <-

φ -> φ <-

Χ -> Χ <-

χ -> χ <-

Ψ -> Ψ <-

ψ -> ψ <-

Ω -> Ω <-

&Omegaacute; -> Ώ <-

&Omegacbm; -> Ὠ <-

&Omegacbmtilde; -> Ὦ <-

&Omegagrave; -> Ὼ <-

&Omegaobm; -> Ὡ <-

&Omegaobmacute; -> Ὥ <-

&Omegaobmtilde; -> Ὧ <-

ω -> ω <-

&omegaacute; -> ώ <-

&omegacbm; -> ὠ <-

&omegacbmacute; -> ὤ <-

&omegacbmgrave; -> ὢ <-

&omegacbmtilde; -> ὦ <-

&omegagrave; -> ὼ <-

&omegaiotasubscrpt; -> ῳ <-

&omegaobm; -> ὡ <-

&omegaobmacute; -> ὥ <-

&omegaobmtilde; -> ὧ <-

&omegatilde; -> ῶ <-

&omegatildeiotasubscrpt; -> ῷ <-

&cbm; -> ᾽ <-

&obm; -> ῾ <-

Hebrew Characters

&alef; -> א <-

&finalalef; -> א <-

&bet; -> ב <-

&finalbet; -> ב <-

&gimel; -> ג <-

&finalgimel; -> ג <-

&dalet; -> ד <-

&finaldalet; -> ד <-

&he; -> ה <-

&finalhe; -> ה <-

&vav; -> ו <-

&finalvav; -> ו <-

&zayin; -> ז <-

&finalzayin; -> ז <-

&het; -> ח <-

&finalhet; -> ח <-

&tet; -> ט <-

&finaltet; -> ט <-

&yod; -> י <-

&finalyod; -> י <-

&kaf; -> כ <-

&finalkaf; -> ך <-

&lamed; -> ל <-

&finallamed; -> ל <-

&finalmem; -> ם <-

&mem; -> מ <-

&finalnun; -> ן <-

&nun; -> נ <-

&samekh; -> ס <-

&finalsamekh; -> ס <-

&ayin; -> ע <-

&finalayin; -> ע <-

&pe; -> פ <-

&finalpe; -> ף <-

&tsadi; -> צ <-

&finaltsadi; -> ץ <-

&qof; -> ק <-

&finalqof; -> ק <-

&resh; -> ר <-

&finalresh; -> ר <-

&shin; -> ש <-

&finalshin; -> ש <-

&tav; -> ת <-

&finaltav; -> ת <-

Punctuation, Currency and Syntactic Characters

  -> <-

– -> – <-

— -> — <-

&dash; -> – <-

&section; -> § <-

&paraMark; -> ¶ <-

£ -> £ <-

’ -> ’ <-

‘ -> ‘ <-

” -> ” <-

“ -> “ <-

… -> … <-

&slash; -> / <-

&threeDotStop; -> ⸫ <-

&punctusInterrogativus; -> ? <-

Units of Measure

&min; -> ′ <-

&foot; -> ′ <-

&sec; -> ″ <-

&inch; -> ″ <-

&drachms; -> ʒ <-

&scruples; -> ℈ <-

&minims; -> ♏ <-

&unciae; -> ℥ <-

&lb; -> ℔ <-

Brevigraphs

&crossedp; -> ꝑ <-

&crossedP; -> Ꝑ <-

&sup9; -> ꝰ <-

&tail; -> ꝫ <-

&qtail; ->  <-

&que; -> <choice><orig></orig><reg>que</reg></choice> <-

&que2; -> <choice><orig>q;</orig><reg>que</reg></choice> <-

&hookedq; ->  <-

&hookedt; -> tꝫ <-

&uiaSymbol; -> ꝛ <-

&quia; ->  <-

&ssquiggle; -> s<hi rend="superscript">s</hi> <-

&crossedq; -> ꝗ <-

&crossedb; -> ƀ <-

&loopedr; -> <hi rend="superscript">r</hi> <-

&crosseds; -> ẜ <-

&crossedr; -> ꝝ <-

&crossedv; -> ꝟ <-

&flourish; -> ’ <-

&loop; -> ꝭ <-

&pluralLoop; -> <choice><orig>ꝭ</orig><reg>es</reg></choice> <-

&semis; -> <choice><orig>ß</orig><reg>semis</reg><reg type="gloss">half</reg></choice> <-

&uscon; -> ꝯ <-

&etTironian; -> ⁊ <-

&eCaudata; -> <choice><orig>ę</orig><reg>æ</reg></choice> <-

Misc Glyphs

&pointingHand; -> ☞ <-

&dagger; -> † <-

&doublebar; -> ‖ <-

&cross; -> ✝ <-

&aqua; ->  <-

&asterisk; -> * <-

&hash; -> # <-

&HSymbol; -> H <-

Alchemical/Astrological/Medical Characters

&Rx; -> ℞ <-

&SulphurSymbol; ->  <-

&malteseCross; -> ✠ <-

&AntimonySymbol; -> ♁ <-

&AquariusSymbol; -> ♒ <-

&AriesSymbol; -> ♈ <-

&LeoSymbol; -> ♌ <-

&descendantLeoSymbol; ->  <-

&PiscesSymbol; -> ♓ <-

&ScorpioSymbol; -> ♏ <-

&TaurusSymbol; -> ♉ <-

&VirgoSymbol; -> ♍ <-

&GeminiSymbol; -> ♊ <-

&CancerSymbol; -> ♋ <-

&LibraSymbol; -> ♎ <-

&SagittariusSymbol; -> ♐ <-

&CapricornSymbol; -> ♑ <-

&EarthSymbol; ->  <-

&MoonSymbol; -> ☾ <-

&SilverSymbol; -> ☾ <-

&VenusSymbol; -> ♀ <-

&CopperSymbol; -> ♀ <-

&MercurySymbol; -> ☿ <-

&SunSymbol; -> ☉ <-

&GoldSymbol; -> ☉ <-

&StarSymbol; ->  <-

&MarsSymbol; -> ♂ <-

&IronSymbol; -> ♂ <-

&JupiterSymbol; -> ♃ <-

&TinSymbol; -> ♃ <-

&SaturnSymbol; -> ♄ <-

&LeadSymbol; -> ♄ <-

&WaterSymbol; ->  <-

&trineSymbol; -> △ <-

&conjunctionSymbol; -> ☌ <-

&oppositionSymbol; -> ☍ <-

&sextileSymbol; -> ⚹ <-

&squareSymbol; -> □ <-

&LotOfFortune; -> ⊗ <-

&CaputMortuumSymbol; ->  <-

&astroCaputDraconisSymbol; -> ☊ <-

&astroCaudaDraconisSymbol; -> ☋ <-

Mathematical Characters

+ -> + <-

× -> × <-

− -> − <-

= -> = <-

&invisibleTimes; -> <-

Cite as: Transcription Guidelines, https://www.cambridge-platonism.divinity.cam.ac.uk/view/texts/normalised/our-methodology/transcription-guidelines, accessed 2025-06-30.

The Cambridge Platonism Sourcebook

Transcription Guidelines

Cambridge Platonists Project: Transcription Guidelines

Introduction

Document Structure

The Header (`<teiHeader>`)

I. `<fileDesc>`

II. `<profileDesc>`

III. `<revisionDesc>`

Normalisation

Content Tagging

Special Characters

European Characters

Greek

Hebrew Characters

Punctuation, Currency and Syntactic Characters

Units of Measure

Brevigraphs

Misc Glyphs

Alchemical/Astrological/Medical Characters

Mathematical Characters

Study at Cambridge

About the University

Research at Cambridge

The Cambridge Platonism Sourcebook

Transcription Guidelines

Cambridge Platonists Project: Transcription Guidelines

Introduction

Document Structure

The Header (<teiHeader>)

I. <fileDesc>

II. <profileDesc>

III. <revisionDesc>

Normalisation

Content Tagging

Special Characters

European Characters

Greek

Hebrew Characters

Punctuation, Currency and Syntactic Characters

Units of Measure

Brevigraphs

Misc Glyphs

Alchemical/Astrological/Medical Characters

Mathematical Characters

Study at Cambridge

About the University

Research at Cambridge

The Header (`<teiHeader>`)

I. `<fileDesc>`

II. `<profileDesc>`

III. `<revisionDesc>`