OCR

Scanned pages can be converted to searchable text that will be stored invisibly within the PDF when saved. This allows readers of the PDF to search the PDF for the text, and also to copy and paste the converted text. pdfMachine uses optical character recognition (OCR) technology to convert the scanned pages into text. Conversion to text by OCR is not 100% accurate. pdfMachine allows you to select the language on which you are performing OCR.

OCR can be performed :

Automatically from pdfScanMachine after a page has been scanned.
On images/scanned pages in a PDF from pdfMachine.
On images/scanned pages in existing PDFs from the command line, with no user input. This allows OCR to be performed in batches.
Automatically after printing to create a PDF, for PDFs with no searchable text in them.

To convert a scanned page into text first open it in pdfMachine. You can use pdfScanMachine to scan a page into pdfMachine. OCR can even be done during the scanning process by checking the OCR checkbox on the scan dialog. You can OCR any page from the Tools menu in pdfMachine. In either Viewer mode or Edit mode select "Run OCR (all pages)" to perform OCR on all pages.

The very first time you run OCR from pdfMachine you will need to select the language of the file you are converting from. For example, if your scanned page has English text on it select "English". pdfMachine will proceed to download and install the language files needed to perform the conversion.

note: The language selection will be remembered for future conversions. The language files do not need to be downloaded each time. If you wish to change the language you can do this using the "Change OCR language" from the Tools menu.

pdfMachine will then convert the scanned pages.

Perform a Save or Save As to save the invisible text with the PDF.

The text is now searchable from PDF readers with search capability. The text can also be copied and pasted.

Upon OCR, pdfMachine will insert invisible text with the PDF. The text is now searchable from PDF readers with search capability. The text can also be copied and pasted.

OCR Methods

OCR during the scan process

Use pdfScanMachine to scan a page into pdfMachine. Check the OCR checkbox on the scan dialog.

OCR from the pdfMachine Tools menu

You can OCR any page from the Tools menu in pdfMachine. Select "Run OCR (all pages)" to perform OCR on all pages.

OCR while converting an image to PDF

When using the pdfMachine right-click context menu in Windows Explorer to convert an image to PDF you can check the box for OCR to also OCR the converted image.

OCR from the command line

OCR can be performed using the pdfMachine command line tool pdfMachineOCR. This allows OCR to be performed in batches.

OCR after every print to create PDF

In the Next Action area of pdfMachine options you can configure pdfMachine to perform OCR after printing to create a PDF, for PDFs with no searchable text in them.

Language Selection

The very first time you run OCR from pdfMachine you will need to select the language of the file you are converting from. pdfMachine will proceed to download and install the language files needed to perform the conversion. The language selection will be remembered for future conversions. You can also change it using the Tools menu.