Forum Archive Home -> Computer -> Question about pdf book scan that can be edited
Question about pdf book scan that can be edited |
| jimdagys posted 2009 Nov 02 18:10 |
| Someone sent me part of a book in pdf format. (See book.pdf file at end of this post)
You can clearly see that it is a photo (image) of the book by looking at the shading in the book fold and that the writing is slightly slanted. However, I was very surprised that I can highlight, copy/paste the text into Microsoft Word. How can this be? When I scan a book in the pdf image format, I cannot highlight, copy/paste the text. If I OCR with another application, then the shading/slanted writing is gone, but OCR errors are introduced. The above file looks like an image, but the text can be highlighted/copy/pasted. Can someone please explain how this can be? book.pdf |
| rallynavvie posted 2009 Nov 02 18:33 |
| Good OCR software can do that just fine. Acrobat should be able to import and OCR that type of scan. |
| jagabo posted 2009 Nov 02 18:44 |
| The text isn't graphics. It's slanted text overlaid onto images that look like a scanned book. |
| minidv2dvd posted 2009 Nov 02 18:54 |
when you make your own scans, make sure to use "custom" and check the box for adobe acrobat pro to ocr the image. then it is highlightable/copyable.
![]() |
| jagabo posted 2009 Nov 02 19:08 |
| It looks like it contains both invisible text overlaid onto graphics of the text. If you zoom way in and select words and/or letters you'll see that the selection doesn't always line up perfectly with the visible text. |
| hunter99 posted 2009 Nov 02 19:31 |
| In Acrobat try this:
Tools -> Comment & Markup -> Text Edits Highlight any word and type in a different word. You won't see the new word, but if you copy/paste to notepad (or such) the word you typed will be there. So as minidv2dvd stated it was scanned and OCR'd. It looks like the scan (image) and OCR data are separated in the pdf, you see the scanned imaged only. Also some of the scan was not OCR'd correctly, and a copy/paste shows garbage. |
| jimdagys posted 2009 Nov 02 22:47 |
| OK, so when I look at the pdf, I'm looking at an image, but hidden behind the image is the text that can be highlighted/copied/pasted. Very interesting. Can you tell me what part was OCRed incorrectly? I can't find it. |
| hunter99 posted 2009 Nov 03 00:53 |
| jimdagys:
Had to really search this time, first time it was the 2nd line/word I checked. S--t luck I guess. Try page #160, the curser highlights between lines. Also in third line the "I" is the #1 --- "1 had left three cases in" Page #169 Par 2 Line 5 - "I .ncy's food cans were" jagabo please forgive me. Twice I've retraced your steps. I'm 70 and it takes awhile for this old mind to comprehend what people write,say, or do. george |
| jimdagys posted 2009 Nov 03 02:27 |
| OK, Thanks. I just highlighted the whole page (instead of a few lines), copy and paste into Microsoft Word, and the few OCR problems are obvious. Learned something new. |
Login/Register to our forum to be able to post here.

