Google is now indexing scanned documents looking results. In other words if you scan a webpage of text, save it as a jpg or gif image and post it to the web, it will be treated as an actual page of text rather than an image. In a post on the Official Google Blog, Product Manager Erin Levey reveals a little bit on the Google’s doing:

“In the past, scanned documents were rarely included in search results even as couldn’t be sure of their content. We had occasional clues from references to the document– to get a search result with a title but no snippet mentioning your query. Today, that changes. We are now able to perform OCR on any scanned documents that we find stored in Adobe’s PDF format. google index download  This Optical Character Recognition (OCR) technology lets us convert images (of 1000 words) into a thousand words — words that can be searched and listed, so that these valuable documents are more easily found. This is a small but important advancement in our mission of making all the world’s information accessible and useful.

While we’ve listed documents saved as Pdfs for a long time now, scanned documents are a lot more difficult for a computer to see. Deciphering is the reverse of printing. Printing turns digital words into text in writing, while deciphering makes be sure you picture of the physical paper (and text) so you can store and view it on a computer. The scanned picture of the text is not quite the same as the original digital words, however — it is a picture of the printed words. Often you can see telltale signs: the ring of a coffee cup, printer smudges, or even collapse lines in the pages”.

This information could save a lot of time spent re-tying documents for web pages. A scanned document on your website are now able to be optimised for the search engines just as as any other website text would be.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *