2017-07-28

Yandex Image Search and Google Image Search

Been a long time since I last posted. Can't say it'll happen more frequently, but this is a start.

https://yandex.com/images/ https://www.google.com/imghp

Both allow you to do a standard text description search. Both allow you to search for an image based upon one for which you have a known URL. Both allow you to upload an image to search. It's when you get to the results of the search that things differ.

I'm going to use the following test image which I uploaded from my computer; in this case I know the precise URL where I found it, although Calibre renamed it when using it to add a cover to the .rtf version of the book in my possession. Incidentally, the book is well worth reading.

I know, a pretty plebeian image, but since I don't have this blog set up behind a 21+ firewall, my choices are limited; if I didn't, I've got an image that I downloaded at least ten years ago, where I didn't record where I found it, didn't know the name of the model, who took the photo, where it was initially published, didn't know a thing about it other than that I thought the model was good looking, where Google couldn't find anything like it, but Yandex found the precise image with a bunch of sites, including a site with an entry on the model containing six photo shoot collections of around 90 images each; I now know the model's first name, went from three images to way too many, but know nothing much about her since the site Yandex found didn't provide that information; the site is natively in Russian, but has a drop down list of other languages to display in, including English. 

Moving right along...

First, the Google search. Rocinante cover, Google Image Search results 
Second, the Yandex search. Rocinante cover, Yandex Image Search results 

This was, perhaps, too easy an item to find. I may have to try this again with something more obscure.

Google didn't find the same resolution image, so it didn't declare a winner. It's best guess as to the identity of the image was spot on. The first site it listed was the actual source site. The related images were alternate cover images for the book, which indicates Google searched for related images by their best guess title, rather than items which featured things that looked like the submitted image. The first four sites listed as having matching images did, indeed, have matching images, while the final two sites were completely bogus. Only one of the sites which had a matching image was not owned by Wes Boyd; that site looked to be of interest to me, and I've now subscribed to their mailing list.

Yandex was much more confident about saying it had a match. It immediately offered a list of different resolutions for the image, with links to those images; this is something Yandex does for every image you select from those displayed in their search results, and I find this very useful. The related images section didn't come up with the alternate covers of the book, but instead images of aircraft similar to the one on the cover. This indicates their related image search is based upon an analysis of the submitted image to determine the main topic of the image, rather than the item the submitted image had been linked to. This is an important difference, and should be borne in mind when deciding which search engine to use. All six of the sites listed as having matching images did. All six sites are owned by Wes Boyd. Google didn't find as many sites owned by Wes Boyd, but did find the image at someone else's site.

Neither found the entry at LibraryThingGoodreads had the book listed, but showed one of the alternate covers; the only Wes Boyd book they listed where that was the case. FictionDB had the alternate cover. The Google Books entry didn't show. A whole bunch of others didn't show, including Nook and Kindle eBook stores.

Now to try again.
This is an image of the map of Middle Earth included with one of the hardcover editions of The Lord of the Rings published by Allen & Unwin lo these many years agone. 


Google, again, wasn't sure about it's identification, but it's best guess was pretty good. The two sites they list before showing related images were sites I already knew about as primo Middle Earth fan projects. The related images were spot on, all being similar maps of Middle Earth. Google then goes on with a bazillion hits for sites with matching images, I mean pages upon pages upon pages, leading off with five articles on the find of a copy of the map hand annotated by J.R.R. Tolkien himself. And where possible, a small thumbnail of the image at that site appears to the left of the listing.

Yandex, again, was sure of it's identification, and offered a variety of resolutions for the image. The related images were spot on.  The sites listed as having matching images aren't organized the way Google's are, which may be good or bad; after all, the first five sites Google listed had basically the same information, while Yandex leads off with a Korean language Middle Earth fan site rich with maps. Of course, I didn't know it was Korean, and the translating software used by Chrome doesn't tell you what language is being translated from, which is a grievous lack, and the translated site didn't have anything saying it was based in Korea, except that in the About page it did list a problem at one time with the Palgong Port interface, and a search on Palgong determined that it was in South Korea. Yandex also includes a thumbnail of the image as part of each site listing, and continuing their focus on resolution, lists the resolution of the image at the bottom of then thumbnail.

Google is very good at finding information in your language, and geographically close by. This is because of all the information they collect about you, as the Internet Conspiracy Theorists rant about all the time; I think it's cool,  I generally get better results because of it. But there are times when that isn't what you want. It wasn't until the end of the thirteenth page of results that Google listed a non-English language site; Yandex lead off with one. However, Google did have those pages upon pages of sites, while Yandex only lists forty-three sites. And both of my example search objects were non-obscure; as I related at the beginning, I had an obscure Adult Model image in my collection that Google didn't have a clue about, that Yandex, given their far more aggressive delving into former Soviet countries resources, found.

If your interest lies in finding different resolution images, foreign language resources, or obscure Adult Model image information, Yandex is definitely the search engine to use. If you want localized information stick with Google, that's where they put their focus. There are other image search providers out there, but I haven't tried them out; it could be well worth your time checking them out, as I suspect each has differing strengths and weaknesses, and with proper investigation you would be able to select the best search engine for your specific research project. I know I'll be switching back and forth between Google and Yandex, just like when I'm looking for used books that were published in Scandinavia I search Antikvariat.net rather than AbeBooks, you choose the proper tool for the task at hand.

Update: 2017 10 24

Google really has a problem due to its localization process. It won't forget about your most recent searches, and where you found useful information. So if you start a new search, which has nothing to do with your previous search, it hits all the wrong web sites first. At least that seems to be what happens when trying to ID images of Europeans found on Asian web sites; since you had just been visiting Asian web sites, that's where it starts looking, and since that's where you found the images, why, there you go, success! Except that there isn't any identification information there, if there had been, I wouldn't be doing an image search in the first place, I'd be doing a text search based upon the ID. And if Google doesn't make a solid ID the first time through, it bases it's broader search for similar images based upon the text found in those first web pages. Fine and dandy if the page is for a narrow subject, but if its a general page, then only general terms will be provided. Such as the image search where the term Google insisted upon adding as it's text criteria was the word "girl"; not standing, sitting, laying down, in a chair, leaning against a wall, wearing a business suit, wearing a bikini, wearing nothing at all, just "girl", so that's what were brought up as "similar" images, images of lots of very different girls. Or where it insisted upon a Portuguese search term, since it was a Portuguese site where it found a matching image. Funny thing, images don't group based upon the language of their originating country, not if they have been around for any length of time, and if the first sites where a match is found isn't the same language as the language where the image was created and first posted, the foreign language search terms will actually decrease the likelihood of finding a proper ID. You use search terms in the language which has the most information concerning your subject of research. This is precisely why I'm creating a glossary of search terms in various languages, attempting to find equivalent phrases so I can use the descriptive phrase appropriate to the language which has the most information on the subject; when researching the Venice-Ottoman Wars of the mid-1500s, English is _not_ the most productive language to search in. Determining the foreign language term used for a location in the mid-1500s is not an easy task, especially when starting with the temporal congruent English term which doesn't match the Modern English term.

Yandex, on the other hand, doesn't localize the search, so if the model is European and you found the image on a Japanese site, Yandex will start with European sites, because that's where it will find the greatest number of hits. If Yandex doesn't ID the image right off, it analyzes the image itself, not the text on the websites where the image is found, and pulls up images that look like the image submitted; the images will have the same stance, similar background, similar clothes (or lack thereof), and even tend toward the same hair color. In other words, images that really do look like the image submitted. Google insists upon adding text search terms based upon the websites where it found matches with the submitted image to the image search after its first try; Yandex analyzes the image itself, and tries to match the image, without adding search terms. Guess which is most successful if you are trying to find more images of a specific item, and didn't stumble upon a site dedicated to that item the first time around? Yandex, hands down, since it really does pull up similar images that aren't exact matches, regardless of any accompanying text.

Now, when Google does make a correct ID, it tends to put a lot more information onto the screen than Yandex does; with Yandex, you have to actually visit the relevant websites, while if Google can find a matching Wikipedia page, it will abstract the basic information and show it in the upper right quadrant of the screen.

But Google's pulling text search terms from the pages where the image was found helps to explain why the first seven or so hits for the Tolkien map image were all near identical articles in the popular press about the British museum obtaining a copy of that map hand annotated by J.R.R. Tolkien himself; it grabbed the text from the first hit it found, and that got the best match from the other papers getting their articles from the same news service. So after the first hit, the next six where useless duplicates, and they weren't actually the results of an image search, but of the image search converted to a text search.

No comments: