2017-09-17

The Son of More Thoughts on Transcription

In my first post I discussed a philosophy of Transcription. In my second, the creation of a master font document to assist in character recognition, and the concept of initially transcribing into a font that matches the source document, for ease of comparing your initial transcription with the source to see if the individual characters match. In this post I'm going to talk about getting access to your source document.

There is one assumption I'm making, and that is that you are using a desktop computer for this purpose. I can envision using a laptop, but anything without a physical keyboard distinct from the display is right out.

Your source document will come in one of three basic forms. 1) Digitized images of the original, 2) a hard copy of the original; this may be a physical book, a photocopy of the document, or, if you are fortunate enough to be working with the owner of the original document, the original document itself. In the case of working with the original document itself, odds are very good that you will be doing this where they store it, and unless they are providing you with access to a work station, you will be using a laptop. 3) Sound recordings. Sound recordings are a whole nother kettle of fish, if they aren't a sound file, because you will need to have the equipment to play back the media they are recorded on. Well, even if they are a sound file, they may be on an outdated storage format, such as floppy disks, and in an outdated file format. In which case you would need access to a computer of the appropriate vintage, with the appropriate audio software. As time passes, this is going to become harder and harder to do; I no longer possess a computer with floppy drives of any kind that still works, and it's been quite some time since I had access to anything capable of running a pre-Windows 95 program. Anyway, if you are dealing with sound recordings that aren't digitized audio files, you will need the appropriate equipment to play them. I'm not going to go into what all this might entail, at least not in this post, just take my word for it that finding the equipment to playback non-digitized audio recordings may be quite the adventure, if it doesn't come provided with access to the sound recordings themselves. However, you would be surprised what equipment is still available, if you hunt around a bit; the online marketplace has made obtaining obsolescent equipment much easier, as individuals who couldn't quite bear to just throw their old equipment away now have a means of finding it a new home, and those who made a business out of obtaining obsolescent equipment from those wanting to get rid of it (heck, sometimes they even got paid to take it away!) for resale to those who needed that equipment to access obsolescent media now have it much better when it comes to outreach to their prospective customers. And, there are those who make a business out of converting audio between different storage media; for a price, you send them your outdated media, they'll send you back the contents on current media. This holds true for all data types, not just audio; if you are willing to let them retain a copy of the converted data, and distribute it as they wish (including selling copies), they might be willing to arrange a lower price, but it would need to be something marketable that isn't under someone else's copyright.

Digitized images of the original: In short, a computer data file. Hopefully, this will have been created recently enough that it is in a current file format, and current storage media. If a non-current file format, you will need to either obtain conversion software so you can convert it to a modern file type, or software capable of displaying the contents of that file type. If it's not a current storage media, we're back to the problem outlined with audio recordings, of needing to obtain the equipment necessary to read the storage media and file type. For my purposes in this post, I'm going to pretend that your source image is in a current file format, stored on modern equipment, such that you can view it on your main computer's monitor. In some cases you may be allowed to download the images to your own storage media, in other cases the source site may not allow downloading (and installed the appropriate scripts to disable mouse right -clicks from pulling up a context menu), and you will need to keep an active browser window open to their site. Of course, their not allowing you to download a copy of the image should raise the question of whether you have their permission to create a transcript of the document. If it is a unique document, you really need to contact them to seek their permission to create a transcript from it; while the original document may be out of copyright, odds are real good that the image they won't let you download is in copyright, and modifying the image, which includes transcribing the contents, requires their permission. In writing. One can argue fair use for transcribing a small portion of the information contained in the image, enough for a quote in another document, but a complete transcription is right out without their permission. If they allow you to download the image, but require permission to use the image in a publication, you will still need to contact them about distributing your transcription in any form. If it is not a unique document, things get a little bit iffy. But only a little bit. Sure the original is not unique, but do you have physical access to any of the other physical copies? Has anyone else made images of one of those copies available without constraints placed upon their use? If the answer to those questions is No, then you still need to get their permission. If the answer to either of those questions is Yes, then that's what you need to do to access the document if you don't want to contact the image producer about producing a transcript from their image.

[Note: A bit tardy, but I've just emailed the Lord Collection to request permission to make transcriptions from their .pdfs. As with my article on Link Rot, I must practice what I preach.][2017 09 18: Got an email back, it's cool with them. Yay!]

There are online repositories of digitized documents that make their holdings available without constraint, other than not selling what you obtain from them; derivative works, your call, but there needs to be substantive changes made, such as transcribing them into a modern typeface, annotating them, translating them into another language, things that take considerable time and effort, such that you have a real claim on the resulting document. Google Books, the Internet Archive, any agency of the United States Government, in general any State Government agency, Project Gutenberg, to name a few.

Accessing the original document in hard copy.

If it is a published work, now out of copyright, and you own a copy of it in hard copy, you are set, good to go. I would recommend investing in a good document holder, appropriate to the hard copy format, to hold the document open and well displayed while you work from it.

If you do not own a copy of the work, you may be able to borrow a copy via your local library; while they may not have a copy themselves, they could try to borrow it from another library that does, through InterLibrary Loan (ILL). There is a caveat to this, and that is, the less common the item, the less likely that anyone who still has it will lend it out. I worked in the Bibliographic and Interlibrary Loan Center of the Chicago Public Library for three years, I know whereof I speak.

If you don't own a copy, and can't borrow a copy, you will have to go to where a copy is kept. First, you have to find out where a copy is held. For published works, OCLC WorldCat is the best place to start for holdings within the USA, as it is drawn from the cataloging database that OCLC maintains of materials for which they have bibliographic records, and they are the major, although not the only, cataloging database service provider in North America. Outside of North America their coverage is not very good. OCLC has been in operation since 1967, and by now, most libraries in North America have substantially completed their retrospective conversion projects; retrospective conversion is a fancy term for taking the information from your physical card catalog and converting it into information in an electronic database, typically available via the library's online catalog. Pretty much, the only things that haven't been converted are items unique to a given collection, where they haven't been able to afford the time of an original item cataloger to create the bibliographic record. Original cataloging is a lot harder than copy cataloging; copy catalogers have to be very careful, but what they are doing is searching the existing cataloging records for one which matches the physical description of the item in their collection; if they find one, they attach their holdings code to the record, download the record for use in their online catalog, and proceed on to the next item. If they can't find a matching record, they record that fact in a local record of some kind, and move on to the next item. The record of items for which a matching bibliographic record wasn't found will then be accessed by an original item cataloger, when they can afford to hire one; note that point, when they can afford to hire one. Pretty much all libraries of any size have a copy cataloger on staff, to handle their ongoing acquisitions. It may not be a dedicated copy cataloger, but someone who does it as part of their duties; my sister, when she was the Children's Librarian in Klamath Falls, Oregon, did the copy cataloging for the Children's Library as part of her duties. But original cataloging is much more time consuming, and requires a very analytical, detail oriented mind set; they have to create a bibliographic record that accurately describes the item in their possession such that it is clear what they have, and how the edition of the document in their possession differs from all other editions of that document. Having worked in ILL for three years in one of the largest public library systems in North America, I have a much better understanding of just how important that is than I did previously. Different editions are just that, different. They differ in formatting of the information contained, the actual information contained in the work can differ between different editions; like, duh, why else would they call it a different edition? Different printings of the same edition can vary in appearance. There are all sorts of reasons why a researcher will need access to not just a specific work, but a specific printing of a specific edition. If you are looking at travelling thousands of miles to do your research, you want to be certain before you pack your bags that the copy of the item held by the repository you are going to visit matches the item you are seeking to research. So good, detailed, anal retentive original cataloging is not a luxury, it is mandatory, and people capable of that quality of work cost. Collections greater than a certain size, who have funding adequate to their needs, can afford original catalogers. Smaller collections, and specialized collections, may not be able to afford to have an original item cataloger on staff permanently. What they do is 1) hope their item isn't as unique as they fear, and a cataloging record will be input by another institution that matches the item in their collection, and 2) seek outside funding in addition to their normal funding to hire a project cataloger, someone who will focus all their efforts on cataloging the items unique to their collection, for the duration of their funding. They don't always call these individuals catalogers, sometimes they are called archivists; archivists focus on non-published items such as personal and corporate papers and records, but the basic concept is the same, the creation of entry points to the holdings of the library/archive, such that researchers can become aware of what they have that is unique to that collection, so people will use the materials and justify the expense of preserving them; researchers are also a revenue source, while publicly funded repositories are usually free to access in person, privately funded collections frequently charge for admission to their collections, as a means of supplementing their usually inadequate funding; they are also more likely to charge publication fees for use of the information unique to their collection in publications, said fees generally on a sliding scale based upon expected number of individuals who will access that publication.

And with that last, I've advanced to unique items. Items that are unique to a given collection, because few if any copies were made. While WorldCat's coverage in this area is improving, that's damning with faint praise. This is where you need to have some reason to think that a given collection would have resources relating to your research, before you can search their holdings information. Thankfully, as these collections are able to obtain funding for inventorying of their unique holdings, more and more information about these collections is becoming available via web searches. Also, there are a growing number of organizations such as Archives West, which acts as a portal to the specialized collections of a great many collections in the Greater Pacific Northwest, allowing you to use their front end search software to search the a number of specialized collections at once; caveat, due to the variety of materials in these collections, they don't all use the same terminology in their collection descriptions, you need to try a number of searches using terms tangential to each other to maximize the chances of finding that they have materials related to your research.

Hm. Shifted from transcription to research. Well, looking for a collection that holds a copy of the fairly unique item you want to transcribe is research. And, I have to admit, that's how I've tracked down the items I've been transcribing, searching on the web for items related to my area of interest; I didn't start out looking for Vincentio Saviolo his Practise in Two Bookes, I was looking for historical fencing manuals, and stumbled across the Raymond J. Lord Collection by purest chance. It was only afterwards that I located the various HEMA link repositories that directed there. I mean, the University of Massachusetts does not immediately spring to mind as an institution which would have a collection of historical European fighting manuals. Once you find out about their academic programs, not so surprising.

Well, I did, and didn't, cover what I intended to in this post. It certainly isn't what I'd been thinking about earlier today, which was the physical layout of your transcription area. But it did cover something important; before you can transcribe, you need to have something to transcribe.

It's past time for lunch.

Post this Puppy!

Edit 2017 09 18: Permission received from the Lord Collection to make the transcriptions from their .pdfs.

More thoughts on transcription of documents in odd fonts

What I've done so far, is transcribe directly to a modern font. I just realized, while trying to make out the letters in a German Blackletter volume, that what you should really do is this: go through your font library to find the font that most closely matches the font you are transcribing from, and use that for your first transcription. If the fonts are a good match, you will be able to tell by comparing your transcription to the original document whether you have correctly identified each letter, because if you haven't they won't look the same.

For this, you need an easy way to look at all of your fonts. Doesn't come with Wndows. But, there is a software solution. High Logic produces a couple of font related programs. The one you want to get is called MainType. MainType only has one download, so that's the one you want. There are three license levels available for the MainType software: 1) Free, which limits the number of fonts that you can have it manage to 2500, 2) Standard, which ups the number of fonts to 10000, and 3) Professional, which has no limits on the total number of fonts, but will only display 50000 fonts at a time; if you have more than that, organize them into families, and assign tags, and you can then pull up just the ones you want to look at. There are some other nice things that the standard and professional licenses provide, but nothing that you need at this time, so when you start the program, always select Free version; it will ask you every time you start the program, but hey, they are trying to sell this software to make a living. They really aren't asking much for the standard and professional versions. If you have that many fonts you are doing this professionally.

MainType will merrily go through and index all of the fonts on your computer. If any have been corrupted, it will let you know, and offer to fix the situation; to do that, you would need the Professional licence. Not needed. It will list the fonts that have gone bad, and what you need to do is bring up your favorite file search utility (I use Everything, available from void tools; it's free, and does a very good job of locating files on your computer.), and enter the file name of the affected font(s); not the name of the font, but the name of the file, which will be at the far right of the info on bad fonts. Once you have located the font file, delete it. Do this with all the corrupt font files. You might think you can avoid searching for them this way, since they generally reside in the Windows Fonts directory, but you will find that if you use file explore to go to that directory, it brings up Windows font manager, which will only display active fonts; the font files you are looking for are not active, because they have been corrupted. The Windows Font Manager just will not show you any files in that directory except active fonts, and you can't bypase it when accessing that directory with file explorer. So you have to use an alternative file search utility, and delete from it's listing of files. Anyway, once that is done, MainType will not bother you about them again. After MainType finishes indexing all your files, it will list them in alphabetical order in a scrollable list, with the font name written in its font. Select a font by clicking on it. On the right of the MainType main window there is a window which shows all the characters suppoerted by that font, arranged in Unicode order inside Unicode groups. You can scroll down this display, and see what the characters are that are supported by the font, and what they look like. Using this display, you can go through the fonts installed on your computer and see which is the closest match to the font used in the document you are considering transcribing. If none of them seem close enough, time to go on a font hunt online. Now that I know about it, the first place I'd start is with Typewolf's site. Typewolf is into fonts, big time. He does it for a living. His site has reviews of an incredible number of fonts, and many recommendations for free fonts if you cannot afford, or don't need, the commercial fonts. he also has a lot to say about the various font sites, which are worth your time, and which aren't. So I'd start there when looking for a new font. I'll assume that, working with his advice, you succeed in tracking down an acceptable font in regard to matching the font used on your document.

Now do your transcription, using that font. I know, the end goal is to have the text in something easier to read. That's the end goal, right now your goal is to be certain you have chosen the correct character to match that in the original document. When you are all done with the transcription, and have gone through the verification process to insure that you have, indeed, chosen the correct character in each case, then and only then, but wait, first save your document, and open a copy of it; you don't want to lose your hard work (this should become instinctive after a while). Once you have opened the copy, select all the text, and apply the font you want to have the document in; well, first verify, using MainType, that it supports all the characters needed for your document. There. Done. You have your transcribed document in an easy to read modern font. Save the document; this is the basis for all of your future text manipulations.

Now you can proceed in the process described in my previous post.

Post this Puppy!

Edit: 2017 10 12: Removed lengthy description of how to create a master sheet of font characters, replacing it with how to get MainType, and why. Added info on Typewolf.

2017-09-15

Link Rot

Link Rot: The condition of HTML links going bad due to changes in the destination site's url tree.

Link Rot happens. Link Rot Deniers lie through their teeth. OK, I don't really think there are Link Rot Deniers, but there are definitely those who don't check their posted links for Link Rot as often as they should.

I spent six hours yesterday preparing an errata sheet for a web site I stumbled across; don't ask why I did this, I'm not totally sure why myself. And that was just for a quarter of the categories based off of one page of their site. Most of their link pages hadn't been updated since 2011. I had 24 corrections for them; in a couple of cases I couldn't find a current site to replace the dead link, but most of them I was able to provide the current url.

In the process of tracking down current urls, I found out dated links on two other sites referring to the url I was trying to update. So when I found the current url, I informed them as well. I'm not going to name names here, but one of the sites is run by a chap who sends out notices about its existence to the major mailing lists of that interest group on a monthly basis. He had links to GeoCities in his list. GeoCities shut down all operations outside of Japan in 2009, for crying out loud!

The url that lead me to his site, the one I was looking for a replacement to, well, the other site knew it was bad, so they had provided a link to an Internet Archive backup of it. Which was a good temporary fix, except, as they noted, it was a music lyric/sound file site, and the .midi files hadn't been grabbed by the Wayback Machine. However, they had enough information about the purpose of the site, which hadn't been provided by the first site, the one that started all of this, that I was then able to find the current site for the organization that all three sites had bad urls for. So there are three sites which, hopefully, will shortly have active links to that organization again, and one site that will, hopefully, have 24 links corrected shortly.

Now, I will admit that I got a bit snarky in one of my emails, pointing out that GeoCities had shut down operations in 2009, which was pretty common knowledge, so there was no excuse for still having a link to a GeoCities site on his link list.

This morning, getting up somewhat later than usual (I finished all of that activity at nearly 2:00 AM), I decided that if I was going to get snarky about other people's Link Rot, maybe I should look at the links in my Blog postings. So I did. Got side tracked a couple of times, but all of my blog posts are now up to date in regard to referring urls. And I updated product availability and price information as well, as annotations, leaving the original information intact, except for turning off invalid product links. With only a couple of exceptions, I changed from linking directly to a sub page to linking just to the home page, and then providing the information needed to use the home page search engine to find the proper sub page. Those exceptions were for sites where, as far as I could tell, they hadn't changed their directory tree schema; they might have added and removed pages, but they hadn't changed the url of an existing, retained, page. It's sad just how few such stable web sites I found. While some url changes are perfectly understandable, such as when your domain owner goes out of business, others reflected the realization that they hadn't put proper effort into their initial web site structure development. Still others reflected organizational changes that required site structure changes to be able to function in a reasonable manner. Anyway, when I found such a stable web site, one that hadn't changed it's url structure since I posted links to it in 2008, I sent them messages letting them know how much this was appreciated, and commending their initial web site design initiative for being so successful.

I'm not staying up as late as yesterday.

Post this Puppy!

2017-09-07

Thoughts on transcribing historical documents.

When transcribing historical documents, there are a number of potential end goals. 1) a strict transcription: the goal is to maintain all the vagaries of the original document, just make it more readable by using modern typefaces. Next to accessing the original document, this is the most accurate presentation; it's also the hardest to produce, as you have to really look at each character of the original document carefully, and have to fight the urge to say, "Oh, it's that word," and make sure that what you enter is what was actually there. You can't trust the results of your first pass through the document, you have to let it sit a while and then make a second, and maybe even a third, pass through the document. Then there's formatting the results. You can just have a text document, or you can try to make it look as much like the original in formatting as possible; this is harder to do, but better for those who are able to access the original, or images of the original, as it makes it easier to place the transcription side by side with the original, and be able to look back and forth between them. 2) a transcription with regularized spellings for the language at that time: If you are not concerned with the spelling variations inside the original, but do want to read it in the original language, this is best suited to your purpose. Again, you can produce a straight text document, or you can attempt to make the transcription match the original in layout. 3) Transcribing/translating to the modern version of the original language. You have to be very careful here, to insure that you capture the meaning in context of each word; word meanings change over time, the word used in the original may no longer have that meaning in the modern language, so you have to replace it with the modern word that most closely matches the original intent where word meanings have changed; either that, or provide a gloss of the meaning of the word at the time the document was written. The previously mentioned methods presume a researcher who is familiar with the original language, and the word meanings at the time the original document was created. This is for those interested in the intellectual content of the original without having to understand the changes in the language. The previous methods have no interpretation involved, no need to really grasp the intended message of the author, it's just typesetting; well, somewhat more than typesetting if you are working with handwritten documents, you have to be able to read the original script, and sometimes that's very difficult; this isn't made any better if all you have to work with is a scan of the original. Here, you have to understand what the author was trying to say, so you can translate it for them into modern language, reflecting the changes in word meanings. This is much more intellectually stimulating for the editor, at this point you are becoming an editor, as you try to change the text as little as possible while trying to create a modern language version. Punctuation changes. Changes in word meanings requires the substitution of the closest modern word that provides the original meaning in the context of the surrounding words; you're not aiming at a total recasting, as much as is possible you want to maintain the original phrasing. You need to be a scholar of the subject the author was writing about, so you can comprehend what he was trying to say, so you can make the changes to the modern language while changing his phrasing and meaning as little as possible. You need to understand the subject both as it was understood when the author wrote the document, and as it is now practiced, so that the changes in vocabulary remain true to the original intent while becoming more accessible to the modern practitioner of the subject. Not all that much recognition is given to the individual who performs the first two types of transcription, it's strictly character recognition in the first, that and spelling regularization in the second. Here, there is interpretation involved, and that interpretation will be debated. But the desire is still for a document that would reflect the original author's style and phrasing, following the conventions of the author's time, with as little change as possible while remaining true to the intent of the original words. While some words are changed, it should still read as a period piece, not a modern document. It should read as if only the language had changed, not the writing conventions; it should remain faithful in style as well as meaning. This produces a document of use to those interested in period practices who are not interested in the language of the time the original was written, but are interested in how the information was presented at the time the original document was created. It retains the intellectual property of the author. Anything beyond the third is a modern interpretation, a retelling rather than reformatting. You are creating a derivative work, a modern work based upon the intellectual content of a historical document but written using modern conventions. This is not what I do, I don't understand the subject matter well enough to recast it in modern form. What I am now trying to do is create distinct documents reflecting the goals outlined above. First, I'm working from documents that use modern European alphabets; while I have access to fonts for Futhark, etc., that's not my primary area of interest, and I have to be interested in the subject matter, or there's no way I'd put up with the drudgery and monotony of this process. In theory, I start by producing a character by character transliteration from the historical typeface to a modern typeface, I generally use Times New Roman, it's the typeface we are most used to reading, although I'm considering switching to using Georgia; the really tricky bit is attempting to retain the original specialized non-alpha-numeric symbols, this really comes into play with Elizabethan printed materials, where they will use a
to represent "on" at the end of a word; it's not a character of any alphabet, it's a specialized printer's space saving symbol, and there is no Unicode for it; it is close enough to a ♁ (U-2641) that I've decided to use that in its stead, with a note explaining the substitution. I'm also using (U-0361) to join ct to produce c͡t, (U-0113) ē for “em” and “en” and (U-014D) for “on” appear to be exact matches. After I've finished the first version, the character by character transposition into a modern typeface, and verified that it's accurate, I save that as a master copy and create a copy from it to use for the next step, which is producing a document formatted to match the original document. This has it's own tricky bits. The fancy woodblock/engraving/illuminated initial letters are beyond my ability to reproduce except by creating an image from the scan of the original and inserting it into my document. This holds true for other illustrations/artwork. LibreOffice is not the best program for doing this formatting, but it's what I have to work with; I have time I can devote to this activity, but I can't invest very much money. Again, once I've finished this document, I save a master copy of it, and move on to the next step, which is producing a document with regularized period spellings. For this I return to a copy of the first master document, pre formatting and image introduction. The trick here is to determine what the standard period spellings are. Where possible I consult contemporaneous dictionaries, to see what was the opinion of the time; the larger the number of contemporary dictionaries I can consult, the more confident I am as to the spelling I determine to use. I temper this by checking to see if there are any authoritative modern works covering the contemporary spellings; I know there are modern Anglo-Saxon dictionaries, I suspect that there are modern Elizabethan English dictionaries. I'm not going to go against what modern scholarship has determined unless I think they are all way off base, and that's not very likely. The intent in producing this regularized spelling document is to present what they would have produced if they had computers with spell checkers in the language of their time. Conversely, software is available which can determine the frequency of words within a document; running the original transcription through said software would enable me to determine which spellings the author of the document most favored, and change the other spellings to match; this may not jib with what modern scholarship has determined to be the societal consensus, but would produce a normalized spelling closer to the intent of the author. As part of the normalization process the printers special symbols are transformed back to the text they represent. As a bonus, I'm producing glossaries to words, individuals, places and events mentioned in the documents; what was common knowledge amongst the intended audience may be unknown to the modern reader; if I had to look it up, it goes in the glossary, if I think I knew about it due to specialized knowledge, it goes in the glossary. These glossaries are appended to the end of the document. Depending upon the margins, and if the original text already did this, I might insert text boxes in the margins adjacent to the first appearance of archaic words or word meanings to present their current meanings, as an alternative to replacing them with a modern equivalent; if the original text contains notes presented this way I'll need to find a way of clearly differentiating my notations from the author's notations, to prevent confusion as to who is providing the information; using a radically different font springs to mind, clearly there would need to be a note concerning this. The idea of glossing word meanings adjacent to the first occurrence of the word could be used in the modern spelling document, as a means of avoiding changing the text of the document via the replacement of archaic words with their modern equivalents. 

I'm not the only one doing this. Not by far!

There are currently a number of transcription projects ongoing in Academia.

The Text Creation Partnership has transcribed a ton of documents from ProQuest's Early English Books Online, Gale Cengage's Eighteenth Century Collections Online, and Readex's Evans Early American Imprints, all of which are restricted access services. ECCO-TCP (Eighteenth Century Collections Online); these are available to anyone. EEBO-TCP (Early English Books Online) has two parts, the first contains approximately 25,000 books, available to anyone, while the second part, consisting of 35,000 books are only available to TCP partner organizations. Evans-TCP (Evans Early American Imprint Collection) is available to anyone. While TCP's main page doesn't go into detail, they do say these are normalized texts, and a quick scan of the word index for EEBO-TCP and browsing the titles for ECCO-TCP and Evans-TCP seems to confirm this; the frequency of variant spellings is nowhere near as great in EEBO-TCP as would be indicated based upon the two Elizabethan Fencing Manuals that I have examined in depth. everie 3811, everye 118, every 419924 just screams that the spelling has been normalized. publique 19417 and public 3171 confirms normalizing to period practice. Given the large number of individuals doing the transcription and creating metadata over a long period of time, the metadata is not standardized; you have to try a variety of terms if searching the metadata, to insure you find all the texts related to your subject, they weren't working from a standardized thesaurus of terms with clear definitions. It is clear they didn't get the Library Cataloger community involved. I'm not really in a position to throw stones, as I haven't been referring to either Sear's or LC's subject heading works; I have a copy of Sear's, I don't own a copy of LC.

Visualizing English Print is a project that is taking the TCP and similar files and make them more amenable to textual analysis using specialized software. Certain sacrifices had to be made to enable this, which makes their output of no use to those researching period printing practices. All text is stripped to bare ASCII; no umlauts, apostrophes, italics, etc. No attempt is made to preserve document formatting, other than maintaining the same line breaks as their source files. As part of removing punctuation, words were standardized; to wit, fashiond, fashion'd, both were changed to fashioned. So some, not all, spelling variants have been removed from their SimpleText output. It will be interesting to see what people do with the result of their efforts.

Smithsonian Digital Volunteers is a project of the Smithsonian Institution to coordinate the digital transcription of a whole slew of documents either in their possession or in the possession of institutions who have joined with them in this project. As they are constantly creating new images of text items in their collections, this is a very long term project. It started in June 2013, and according to their page, currently has 9085 volunteers.

Citizen Archivist is a similar project of the National Archives and Records Administration.

Manuscript Transcription Projects is a list of projects similar to Early Modern Manuscripts Online (EMMO); EMMO is a Folger Library project, and the Manuscript Transcription Projects link page is maintained by the Folger Library.

FromThePage appears to be a transcription crowdsourcing service provider, where individuals and institutions pay them monthly fees to host their projects, and volunteer transcriptionists log in to do the actual transcription. Their fees for hosting projects seem reasonable, and this allows individuals/institutions to have crowdsourced transcription projects without having to set up all the software/hardware interfaces themselves. Clearly, since they charge for this, once a given transcription project is completed the project owner may choose to remove the project from their site and store it elsewhere, which may or may not include making it accessible through the web.

Papers of the War Department, 1784-1800 is a crowdsourced transcription project of the Roy Rosenzweig Center for History and New Media (RRCHNM), which in turn is a project of the Department of History and Art History at George Mason University. There are a number of projects that the RRCHNM has been involved with, which they provide links to. They have also developed some useful Open Source software for use in this type of activity.

There are many more such projects out there; these are merely those from the first page of a Google search on document transcription projects.

If getting involved in this activity intrigues you, determine what your preferred subject matter is and start looking for relevant projects, if you want to work with established collections, or do as I'm doing, which is tracking down .pdfs or other format scans of relevant documents, transcribing them, placing them on my Academia web page and the Internet Archive, and the files section of pertinent Facebook groups that I belong to. Of course, given the source material being out of copyright (which it had better be if you don't have the permission of the copyright holder), you could always attempt to make some extra money by selling your completed transcription project via the various marketplaces. I'm making the results of my labours freely available, because so much of what I'm able to do these days is a result of others making materials freely available; turnabout is fair play.

Since there were a number of professional transcription sites included in the results of my Google search, there is the option of branching out as a Transcriptionist For Hire once you have developed your skills via volunteering with a crowdsourced transcription project; that works just fine by me, it's in the spirit of the Works Projects Administration projects during the Great Depression, where the US Government put people to work on various projects to give them income and teach them practical skills which they could then put to use in the private sector. Mini rant: They should never have shut down the Works Project Administration, it was successful in all of its goals. The American Association of Electronic Reporters and Transcribers can provide you with information on learning how to do this and getting certified.

I could go on (and on and on) but I think this is enough for now on this topic.

Post this Puppy!