Do you use OCR software on Scanned Documents?

Scanned Documents
Photo by fonis at

Scanned documents are a pain point for eCTD publishers and FDA reviewers alike. No one likes them, but, unfortunately, we all have to deal with them. In their “Portable Document Format (PDF) Specifications,” FDA attempts to minimize the inconvenience of scanned documents (for them, at least) by saying:

“If scanned files must be submitted, they should be made text searchable where possible. If optical character recognition software is used, verify that imaged text is converted completely and accurately.”

It sounds simple enough, doesn’t it? In reality, using Optical Character Recognition (OCR) software has the potential create more problems than it solves. It can be a real challenge to verify that imaged text has been converted completely and accurately. Many organizations wrestle with the desire to be compliant with FDA’s wishes against the time and effort (and cost!) required to do so.

I’d like to know how organizations are dealing with this issue. Are most organizations using OCR software on scanned documents? Do they use it on all scanned documents, or just certain ones? Do they verify the output of the OCR software?

I’ve setup a brief survey on this topic and I would very much appreciate it if you’d take just a few minutes to complete it. No personally identifiable information is required, but if you choose to leave your name and email address at the end of the survey, I’ll email the final results of the survey to you. Please click the link below to take the survey.

Begin Survey!

I’ll release the results in a future post.