2017-09-17

The Son of More Thoughts on Transcription

In my first post I discussed a philosophy of Transcription. In my second, the creation of a master font document to assist in character recognition, and the concept of initially transcribing into a font that matches the source document, for ease of comparing your initial transcription with the source to see if the individual characters match. In this post I'm going to talk about getting access to your source document.

There is one assumption I'm making, and that is that you are using a desktop computer for this purpose. I can envision using a laptop, but anything without a physical keyboard distinct from the display is right out.

Your source document will come in one of three basic forms. 1) Digitized images of the original, 2) a hard copy of the original; this may be a physical book, a photocopy of the document, or, if you are fortunate enough to be working with the owner of the original document, the original document itself. In the case of working with the original document itself, odds are very good that you will be doing this where they store it, and unless they are providing you with access to a work station, you will be using a laptop. 3) Sound recordings. Sound recordings are a whole nother kettle of fish, if they aren't a sound file, because you will need to have the equipment to play back the media they are recorded on. Well, even if they are a sound file, they may be on an outdated storage format, such as floppy disks, and in an outdated file format. In which case you would need access to a computer of the appropriate vintage, with the appropriate audio software. As time passes, this is going to become harder and harder to do; I no longer possess a computer with floppy drives of any kind that still works, and it's been quite some time since I had access to anything capable of running a pre-Windows 95 program. Anyway, if you are dealing with sound recordings that aren't digitized audio files, you will need the appropriate equipment to play them. I'm not going to go into what all this might entail, at least not in this post, just take my word for it that finding the equipment to playback non-digitized audio recordings may be quite the adventure, if it doesn't come provided with access to the sound recordings themselves. However, you would be surprised what equipment is still available, if you hunt around a bit; the online marketplace has made obtaining obsolescent equipment much easier, as individuals who couldn't quite bear to just throw their old equipment away now have a means of finding it a new home, and those who made a business out of obtaining obsolescent equipment from those wanting to get rid of it (heck, sometimes they even got paid to take it away!) for resale to those who needed that equipment to access obsolescent media now have it much better when it comes to outreach to their prospective customers. And, there are those who make a business out of converting audio between different storage media; for a price, you send them your outdated media, they'll send you back the contents on current media. This holds true for all data types, not just audio; if you are willing to let them retain a copy of the converted data, and distribute it as they wish (including selling copies), they might be willing to arrange a lower price, but it would need to be something marketable that isn't under someone else's copyright.

Digitized images of the original: In short, a computer data file. Hopefully, this will have been created recently enough that it is in a current file format, and current storage media. If a non-current file format, you will need to either obtain conversion software so you can convert it to a modern file type, or software capable of displaying the contents of that file type. If it's not a current storage media, we're back to the problem outlined with audio recordings, of needing to obtain the equipment necessary to read the storage media and file type. For my purposes in this post, I'm going to pretend that your source image is in a current file format, stored on modern equipment, such that you can view it on your main computer's monitor. In some cases you may be allowed to download the images to your own storage media, in other cases the source site may not allow downloading (and installed the appropriate scripts to disable mouse right -clicks from pulling up a context menu), and you will need to keep an active browser window open to their site. Of course, their not allowing you to download a copy of the image should raise the question of whether you have their permission to create a transcript of the document. If it is a unique document, you really need to contact them to seek their permission to create a transcript from it; while the original document may be out of copyright, odds are real good that the image they won't let you download is in copyright, and modifying the image, which includes transcribing the contents, requires their permission. In writing. One can argue fair use for transcribing a small portion of the information contained in the image, enough for a quote in another document, but a complete transcription is right out without their permission. If they allow you to download the image, but require permission to use the image in a publication, you will still need to contact them about distributing your transcription in any form. If it is not a unique document, things get a little bit iffy. But only a little bit. Sure the original is not unique, but do you have physical access to any of the other physical copies? Has anyone else made images of one of those copies available without constraints placed upon their use? If the answer to those questions is No, then you still need to get their permission. If the answer to either of those questions is Yes, then that's what you need to do to access the document if you don't want to contact the image producer about producing a transcript from their image.

[Note: A bit tardy, but I've just emailed the Lord Collection to request permission to make transcriptions from their .pdfs. As with my article on Link Rot, I must practice what I preach.][2017 09 18: Got an email back, it's cool with them. Yay!]

There are online repositories of digitized documents that make their holdings available without constraint, other than not selling what you obtain from them; derivative works, your call, but there needs to be substantive changes made, such as transcribing them into a modern typeface, annotating them, translating them into another language, things that take considerable time and effort, such that you have a real claim on the resulting document. Google Books, the Internet Archive, any agency of the United States Government, in general any State Government agency, Project Gutenberg, to name a few.

Accessing the original document in hard copy.

If it is a published work, now out of copyright, and you own a copy of it in hard copy, you are set, good to go. I would recommend investing in a good document holder, appropriate to the hard copy format, to hold the document open and well displayed while you work from it.

If you do not own a copy of the work, you may be able to borrow a copy via your local library; while they may not have a copy themselves, they could try to borrow it from another library that does, through InterLibrary Loan (ILL). There is a caveat to this, and that is, the less common the item, the less likely that anyone who still has it will lend it out. I worked in the Bibliographic and Interlibrary Loan Center of the Chicago Public Library for three years, I know whereof I speak.

If you don't own a copy, and can't borrow a copy, you will have to go to where a copy is kept. First, you have to find out where a copy is held. For published works, OCLC WorldCat is the best place to start for holdings within the USA, as it is drawn from the cataloging database that OCLC maintains of materials for which they have bibliographic records, and they are the major, although not the only, cataloging database service provider in North America. Outside of North America their coverage is not very good. OCLC has been in operation since 1967, and by now, most libraries in North America have substantially completed their retrospective conversion projects; retrospective conversion is a fancy term for taking the information from your physical card catalog and converting it into information in an electronic database, typically available via the library's online catalog. Pretty much, the only things that haven't been converted are items unique to a given collection, where they haven't been able to afford the time of an original item cataloger to create the bibliographic record. Original cataloging is a lot harder than copy cataloging; copy catalogers have to be very careful, but what they are doing is searching the existing cataloging records for one which matches the physical description of the item in their collection; if they find one, they attach their holdings code to the record, download the record for use in their online catalog, and proceed on to the next item. If they can't find a matching record, they record that fact in a local record of some kind, and move on to the next item. The record of items for which a matching bibliographic record wasn't found will then be accessed by an original item cataloger, when they can afford to hire one; note that point, when they can afford to hire one. Pretty much all libraries of any size have a copy cataloger on staff, to handle their ongoing acquisitions. It may not be a dedicated copy cataloger, but someone who does it as part of their duties; my sister, when she was the Children's Librarian in Klamath Falls, Oregon, did the copy cataloging for the Children's Library as part of her duties. But original cataloging is much more time consuming, and requires a very analytical, detail oriented mind set; they have to create a bibliographic record that accurately describes the item in their possession such that it is clear what they have, and how the edition of the document in their possession differs from all other editions of that document. Having worked in ILL for three years in one of the largest public library systems in North America, I have a much better understanding of just how important that is than I did previously. Different editions are just that, different. They differ in formatting of the information contained, the actual information contained in the work can differ between different editions; like, duh, why else would they call it a different edition? Different printings of the same edition can vary in appearance. There are all sorts of reasons why a researcher will need access to not just a specific work, but a specific printing of a specific edition. If you are looking at travelling thousands of miles to do your research, you want to be certain before you pack your bags that the copy of the item held by the repository you are going to visit matches the item you are seeking to research. So good, detailed, anal retentive original cataloging is not a luxury, it is mandatory, and people capable of that quality of work cost. Collections greater than a certain size, who have funding adequate to their needs, can afford original catalogers. Smaller collections, and specialized collections, may not be able to afford to have an original item cataloger on staff permanently. What they do is 1) hope their item isn't as unique as they fear, and a cataloging record will be input by another institution that matches the item in their collection, and 2) seek outside funding in addition to their normal funding to hire a project cataloger, someone who will focus all their efforts on cataloging the items unique to their collection, for the duration of their funding. They don't always call these individuals catalogers, sometimes they are called archivists; archivists focus on non-published items such as personal and corporate papers and records, but the basic concept is the same, the creation of entry points to the holdings of the library/archive, such that researchers can become aware of what they have that is unique to that collection, so people will use the materials and justify the expense of preserving them; researchers are also a revenue source, while publicly funded repositories are usually free to access in person, privately funded collections frequently charge for admission to their collections, as a means of supplementing their usually inadequate funding; they are also more likely to charge publication fees for use of the information unique to their collection in publications, said fees generally on a sliding scale based upon expected number of individuals who will access that publication.

And with that last, I've advanced to unique items. Items that are unique to a given collection, because few if any copies were made. While WorldCat's coverage in this area is improving, that's damning with faint praise. This is where you need to have some reason to think that a given collection would have resources relating to your research, before you can search their holdings information. Thankfully, as these collections are able to obtain funding for inventorying of their unique holdings, more and more information about these collections is becoming available via web searches. Also, there are a growing number of organizations such as Archives West, which acts as a portal to the specialized collections of a great many collections in the Greater Pacific Northwest, allowing you to use their front end search software to search the a number of specialized collections at once; caveat, due to the variety of materials in these collections, they don't all use the same terminology in their collection descriptions, you need to try a number of searches using terms tangential to each other to maximize the chances of finding that they have materials related to your research.

Hm. Shifted from transcription to research. Well, looking for a collection that holds a copy of the fairly unique item you want to transcribe is research. And, I have to admit, that's how I've tracked down the items I've been transcribing, searching on the web for items related to my area of interest; I didn't start out looking for Vincentio Saviolo his Practise in Two Bookes, I was looking for historical fencing manuals, and stumbled across the Raymond J. Lord Collection by purest chance. It was only afterwards that I located the various HEMA link repositories that directed there. I mean, the University of Massachusetts does not immediately spring to mind as an institution which would have a collection of historical European fighting manuals. Once you find out about their academic programs, not so surprising.

Well, I did, and didn't, cover what I intended to in this post. It certainly isn't what I'd been thinking about earlier today, which was the physical layout of your transcription area. But it did cover something important; before you can transcribe, you need to have something to transcribe.

It's past time for lunch.

Post this Puppy!

Edit 2017 09 18: Permission received from the Lord Collection to make the transcriptions from their .pdfs.

No comments: