Posted Feb 22, 2006 07:33 am CST
The scene was not a high-tech nuclear laboratory or a Pentagon think tank. It was a district court clerk’s office. But all eyes were on Beckman as he snapped away. No one had ever photocopied documents in the clerk’s office using a digital camera.
Not long ago, the local clerk doubled the cost of using the copy machine to 50 cents a sheet. Half-a-buck a page seems outrageous; actual copy costs have gone down with time.
Abbyy Software House, the maker of document recognition and language processing software, has a new feature in its FineReader 8.0 application designed to use optical character recognition on digitally photographed documents.
A file from the clerk’s office seemed like a good test of digital camera photocopying. The goal was twofold:
• To find out how difficult it would be to combine single-page shots (multiple files) into multipage graphic files.
• And to test how well the photographed document can be OCR-converted into a document you can edit or search.
The digital camera created JPG files. Abbyy FineReader will save selected pages or images, including JPGs, into several formats, including Adobe Acrobat, creating multipage or single-page files as desired.
As an alternative, Adobe Acrobat Professional easily and quickly took selected JPG files and converted them into a single portable-documen-format file. We then used FineReader to OCR the Acrobat Professional created PDF of the clerk’s file. The results were excellent.
Using the same Acrobat Professional PDF, we tried to OCR the clerk’s file within Acrobat. It failed miserably at the same task at which FineReader excelled.
The lighting in the clerk’s office wasn’t great. A few pages were at a slight angle. The sheets were not lying perfectly flat. But we are done paying 50 cents a page for photocopying clerk’s files. In the future we will send a secretary with a digital camera. We used the same camera to photograph pages of the North Western Reporter. We went to the middle of the volume so the curvature of the page would be maximized, making the OCR more difficult. Even when the photo was taken at a slight angle, the OCR was good.
The camera we used is a two-year-old Canon PowerShot S1 IS with an image stabilizer. All the images were shot at 3.2 megapixels, handheld with the camera set to automatic for as high-quality a picture it could take with the flash turned off. We have even had good luck with glossy pages as long as we pay attention to overhead lighting.
A tripod is necessary for a camera without an image stabilizer. But the cumbersome tripod is probably enough to kill the efficiency of using a camera to photocopy.
Five years ago we tried performing OCR with camera photocopies with what was then a high-quality, 2.1 megapixel digital camera that had a built-in stabilizer.
We also tried with a tripod. But the OCR results were worthless. That is a testament to how far Abbyy has come, along with the improvement in digital cameras.
A neat utility in Abbyy FineReader 8.0 is Screenshot Reader. It lets one select portions of a computer screen to OCR or to save as an image.
FineReader Professional lists for $400. FineReader Corporate lists for $600. The corporate edition permits several users to share one license.
Abbyy 8.0 is phenomenal if you want OCR. Your digital camera becomes a tiny photocopier/scanner that is a link to optical character recognition. If you can see it, you can copy it, and you can OCR it. James Bond never had it so good.