Cleaning up scanned PDFs of old books

I have a PDF of a Victorian novel that I need to clean up, ie remove blotches and spotting from the pages. I also need to 'whiten' the pages to boost constrast. Problem is, I need to so this without impacting the text. In this instance, I don't have access to the original book, so I'm working just with the PDF. Any suggestions?

Restoring text in GIMP

As a first step I would be to convert the PDF to tiff or some other format, this also protects the original. Now use the wavelet decomposition to seperate the image into it's scale information. The script for this is avaiable here in the plugin registry. By switching off the wavelet levels you can remove/ reduce the impact of artifacts that are larger or smaller than the text. Next you can use auto contrast to make the text black and the background white. You may have to play with the brightness/contrast/gamma to get it just right. I can give help if you can post an image of a section of the page. A really cool solution would be if you could resynthesise the image with a font or text sample, this way little bits that are missing due to printing errors can be filled in. Enjoy Riaan

Restoring text in Gimp

Thanks, Riaan. I've selected a section of a page and popped it into a Word document. Where (and how) should I post it? Rob

restoring text

Hi Rob First, please don't put it in a word doc, it just inflates the size of the file. Have you got a deviantart / flicker or similar account? Then you can store it there and post a link. How big is the PDF? If it's less than 5MB you can mail it to: riaan (at) wet test dot co dot za Riaan
Syndicate content