to represent "on" at the end of a word; it's not a character of any alphabet, it's a specialized printer's space saving symbol, and there is no Unicode for it; it is close enough to a ♁ (U-2641) that I've decided to use that in its stead, with a note explaining the substitution. I'm also using (U-0361) to join ct to produce c͡t, (U-0113) ē for “em” and “en” and (U-014D) for “on” appear to be exact matches. After I've finished the first version, the character by character transposition into a modern typeface, and verified that it's accurate, I save that as a master copy and create a copy from it to use for the next step, which is producing a document formatted to match the original document. This has it's own tricky bits. The fancy woodblock/engraving/illuminated initial letters are beyond my ability to reproduce except by creating an image from the scan of the original and inserting it into my document. This holds true for other illustrations/artwork. LibreOffice is not the best program for doing this formatting, but it's what I have to work with; I have time I can devote to this activity, but I can't invest very much money. Again, once I've finished this document, I save a master copy of it, and move on to the next step, which is producing a document with regularized period spellings. For this I return to a copy of the first master document, pre formatting and image introduction. The trick here is to determine what the standard period spellings are. Where possible I consult contemporaneous dictionaries, to see what was the opinion of the time; the larger the number of contemporary dictionaries I can consult, the more confident I am as to the spelling I determine to use. I temper this by checking to see if there are any authoritative modern works covering the contemporary spellings; I know there are modern Anglo-Saxon dictionaries, I suspect that there are modern Elizabethan English dictionaries. I'm not going to go against what modern scholarship has determined unless I think they are all way off base, and that's not very likely. The intent in producing this regularized spelling document is to present what they would have produced if they had computers with spell checkers in the language of their time. Conversely, software is available which can determine the frequency of words within a document; running the original transcription through said software would enable me to determine which spellings the author of the document most favored, and change the other spellings to match; this may not jib with what modern scholarship has determined to be the societal consensus, but would produce a normalized spelling closer to the intent of the author. As part of the normalization process the printers special symbols are transformed back to the text they represent. As a bonus, I'm producing glossaries to words, individuals, places and events mentioned in the documents; what was common knowledge amongst the intended audience may be unknown to the modern reader; if I had to look it up, it goes in the glossary, if I think I knew about it due to specialized knowledge, it goes in the glossary. These glossaries are appended to the end of the document. Depending upon the margins, and if the original text already did this, I might insert text boxes in the margins adjacent to the first appearance of archaic words or word meanings to present their current meanings, as an alternative to replacing them with a modern equivalent; if the original text contains notes presented this way I'll need to find a way of clearly differentiating my notations from the author's notations, to prevent confusion as to who is providing the information; using a radically different font springs to mind, clearly there would need to be a note concerning this. The idea of glossing word meanings adjacent to the first occurrence of the word could be used in the modern spelling document, as a means of avoiding changing the text of the document via the replacement of archaic words with their modern equivalents. I'm not the only one doing this. Not by far!
There are currently a number of transcription projects ongoing in Academia.
The Text Creation Partnership has transcribed a ton of documents from ProQuest's Early English Books Online, Gale Cengage's Eighteenth Century Collections Online, and Readex's Evans Early American Imprints, all of which are restricted access services. ECCO-TCP (Eighteenth Century Collections Online); these are available to anyone. EEBO-TCP (Early English Books Online) has two parts, the first contains approximately 25,000 books, available to anyone, while the second part, consisting of 35,000 books are only available to TCP partner organizations. Evans-TCP (Evans Early American Imprint Collection) is available to anyone. While TCP's main page doesn't go into detail, they do say these are normalized texts, and a quick scan of the word index for EEBO-TCP and browsing the titles for ECCO-TCP and Evans-TCP seems to confirm this; the frequency of variant spellings is nowhere near as great in EEBO-TCP as would be indicated based upon the two Elizabethan Fencing Manuals that I have examined in depth. everie 3811, everye 118, every 419924 just screams that the spelling has been normalized. publique 19417 and public 3171 confirms normalizing to period practice. Given the large number of individuals doing the transcription and creating metadata over a long period of time, the metadata is not standardized; you have to try a variety of terms if searching the metadata, to insure you find all the texts related to your subject, they weren't working from a standardized thesaurus of terms with clear definitions. It is clear they didn't get the Library Cataloger community involved. I'm not really in a position to throw stones, as I haven't been referring to either Sear's or LC's subject heading works; I have a copy of Sear's, I don't own a copy of LC.
Visualizing English Print is a project that is taking the TCP and similar files and make them more amenable to textual analysis using specialized software. Certain sacrifices had to be made to enable this, which makes their output of no use to those researching period printing practices. All text is stripped to bare ASCII; no umlauts, apostrophes, italics, etc. No attempt is made to preserve document formatting, other than maintaining the same line breaks as their source files. As part of removing punctuation, words were standardized; to wit, fashiond, fashion'd, both were changed to fashioned. So some, not all, spelling variants have been removed from their SimpleText output. It will be interesting to see what people do with the result of their efforts.
Smithsonian Digital Volunteers is a project of the Smithsonian Institution to coordinate the digital transcription of a whole slew of documents either in their possession or in the possession of institutions who have joined with them in this project. As they are constantly creating new images of text items in their collections, this is a very long term project. It started in June 2013, and according to their page, currently has 9085 volunteers.
Citizen Archivist is a similar project of the National Archives and Records Administration.
Manuscript Transcription Projects is a list of projects similar to Early Modern Manuscripts Online (EMMO); EMMO is a Folger Library project, and the Manuscript Transcription Projects link page is maintained by the Folger Library.
FromThePage appears to be a transcription crowdsourcing service provider, where individuals and institutions pay them monthly fees to host their projects, and volunteer transcriptionists log in to do the actual transcription. Their fees for hosting projects seem reasonable, and this allows individuals/institutions to have crowdsourced transcription projects without having to set up all the software/hardware interfaces themselves. Clearly, since they charge for this, once a given transcription project is completed the project owner may choose to remove the project from their site and store it elsewhere, which may or may not include making it accessible through the web.
Papers of the War Department, 1784-1800 is a crowdsourced transcription project of the Roy Rosenzweig Center for History and New Media (RRCHNM), which in turn is a project of the Department of History and Art History at George Mason University. There are a number of projects that the RRCHNM has been involved with, which they provide links to. They have also developed some useful Open Source software for use in this type of activity.
There are many more such projects out there; these are merely those from the first page of a Google search on document transcription projects.
If getting involved in this activity intrigues you, determine what your preferred subject matter is and start looking for relevant projects, if you want to work with established collections, or do as I'm doing, which is tracking down .pdfs or other format scans of relevant documents, transcribing them, placing them on my Academia web page and the Internet Archive, and the files section of pertinent Facebook groups that I belong to. Of course, given the source material being out of copyright (which it had better be if you don't have the permission of the copyright holder), you could always attempt to make some extra money by selling your completed transcription project via the various marketplaces. I'm making the results of my labours freely available, because so much of what I'm able to do these days is a result of others making materials freely available; turnabout is fair play.
Since there were a number of professional transcription sites included in the results of my Google search, there is the option of branching out as a Transcriptionist For Hire once you have developed your skills via volunteering with a crowdsourced transcription project; that works just fine by me, it's in the spirit of the Works Projects Administration projects during the Great Depression, where the US Government put people to work on various projects to give them income and teach them practical skills which they could then put to use in the private sector. Mini rant: They should never have shut down the Works Project Administration, it was successful in all of its goals. The American Association of Electronic Reporters and Transcribers can provide you with information on learning how to do this and getting certified.
I could go on (and on and on) but I think this is enough for now on this topic.
Post this Puppy!

No comments:
Post a Comment