2018-01-01

More thoughts on transcription.

A major problem when transcribing from copies, is the quality of the copy. How easy is it to make out the letters? Sometimes quality control wasn't very good during the imaging process, other times the physical condition of the item being imaged was poor.

If there is only one copy available, you live with it. You do the best you can, make a list of the words you have low confidence in, and find out if it is possible to have someone look at the original to see if they can clear things up.

But there may be other copies out there. Search around a bit, and see if you can find alternate images for the sections causing you problems. For this, you don't need a complete copy, just a clearer one of the problem area.

Case in point.

I started working on Saviolo again today; it's been at least a month since I last did, maybe closer to two. And there was a word split between two lines, where I just couldn't make out the first two letters on the second line, and was totally unable to guess what the word should be. For the longest time I'd thought the only scan of Saviolo on the web was the one in the Raymond Lord Collection. But it had recently been forceably brought to my attention that Wiktenauer and HROARR had copies of manuals that I hadn't been aware of, so I thought I'd check and see. Wiktenauer only had a link to the Lord Collection copy, but HROARR had that and two additional scans.

They were from the same printing of 1595, but not from the same copy. Clearly not the same copy. The Lord copy was scanned from a bound volume, and text in the gutter was sometimes hard to make out, which is the problem I was experiencing. These two scans which were new to me, were from unbound copies, it appears; at least, that's the only explanation for adjacent pages in a two page scan being at such odd angles in relation to each other. You wouldn't be able to OCR these scans, in all likelyhood, because they are so galiwonky; OK, if you separated the adjacent pages, and then did incremental image rotations on each page, you might get something oriented such that it could be dropped into an OCR program, but you'd have your work cut out for yourself.

Looking at the text that was unreadable in the Lord scan, it was easy to make out the letters. One of them was a printer's contraction, which explained why I couldn't figure out what two letters needed to be added to make a recognizable word that fit in context; it needed three letters inserted, not two.

Lesson learned.

Always check to see if additional images are available. Even if only one original exists, and they no longer allow imaging of it, it's possible someone made a photocopy or took a camera picture of the section you are having problems with somewhere along the line before they clamped down on imaging, and it might be a cleaner copy of that section. If a published work, odds go up for other copies having been imaged at other locations. Or someone else might have attempted a transcription prior to the original deteriorating to it's current state, and consulting it might help clear things up.

And, always try and find web sites dedicated to your subject matter; they may be aware of copies that you didn't know about, or in other ways have useful information, such as glossories of terms used in works such as the one you are working on, so that you will clue into valid spellings that you weren't aware of. While it's possible to do an accurate transcription without understanding the subject matter, if sections of the original are hard to make out, knowledge of the subject being dealt with may help, provided you have access to sources which use contemporaneous terminology. Terminology changes; the terms used four hundred years ago may well differ from those in current use. That, after all, is one of the reasons for attempting a transcription rather than a retelling, to preserve the period terminology and practise.

Don't try and operate in a vacuum.

No comments: