Please enable JavaScript to view this site.

TIFF Image Printer

Navigation: Optical Character Recognition (OCR) with TIFF Image Printer

Use OCR to Extract Text When Creating TIFF Images

Scroll Prev Top Next More

OCR (Optical Character Recognition) searches for and recognizes text (characters) on scanned pages or images and extracts it as digital text. Outside factors such as image quality, the font used, and any image background on the pages will all affect the validity of the OCR results.

Using TIFF Image Printer, you can create TIFF images and OCR the pages during creation to save the extracted text as hOCR, Text, or ALTO files. You can choose which type of extracted text file to create from the options in the OCR tab. You can even generate all of them at once if you want. The OCR process saves the extracted text files with the same base name and location as the new image.

You need to be able to open and print your original document to create a TIFF image and the OCR-extracted text files. For example, to create a TIFF image from a PDF file, you would need Adobe Reader or another PDF viewer to print the file to TIFF Image Printer.

TIFF Image Printer 12 uses Profiles. Profiles are a group of settings used to create your output image. TIFF Image Printer comes with the following system profiles: Color Optimized TIFF, Monochrome TIFF, Fax TIFF, and Serialized Color Optimized TIFF.

To create multipaged color TIFF images and extract the text to separate files, we will first make a copy of the system profile Color Optimized TIFF. Next, we will enable OCR in our new profile and save it. Lastly, we set it as our default profile that TIFF Image Printer uses when we print. This tutorial uses TIFF images but the steps are similar when creating PNG, JPEG, and other image formats.

Step by Step Instructions

1.Launch the TIFF Image Printer Dashboard.

LaunchDashboard

 

2.Select "Edit & Create Profiles" to open Profile Manager.

3.Find the system profile named Color Optimized TIFF. Create a copy of it using the copy icon in the lower left. The same steps we are doing here apply to any system profile. You can also create custom profiles through the Add a profile button in the upper left corner of the Profile Manager.

OCRTIFF-CreateProfileCopy

 

4.Give your new profile a name and a description, and click Save.  

5.Now go to the OCR tab to turn on OCR (Optical Character Recognition). Running OCR on each page can be a time-consuming step. For this reason, it is disabled to start.

OCRTIFF-EnableOCR

 

6.Next, choose which types of OCR text files to create. There are three to choose from:

a.hOCR is an XHTML file containing the text extracted from the page. It also stores format and layout information and a score for how confident the OCR engine is on its match.

b.Text creates a UTF-8 text file containing only the extracted text.

c.ALTO is similar to hOCR but stores the information as XML following the Analyzed Layout and Text Object specification.

For our example, we chose Text OCR as we only want to extract the text from the page and don't care about the layout or positioning on the page.

OCRTIFF-ExtractTEXTOutput

 

7.Lastly, choose which languages to look for on the page.The OCR engine works by looking for and matching the characters' curves and shapes. For this to happen, it has to know what languages to look for. You must select at least one language. The more languages to match against, the longer the OCR process will take. If you have documents with mixed languages, select all languages used.
 
TIFF Image Printer can recognize Arabic, English, French, German, Hebrew, Hindi, Italian, and Spanish languages. See the OCR tab for instructions to download additional languages.
 
OCRTIFF-ChooseLanguage

 

8.With your OCR settings configured, you can now Save the changes to your new profile, then click the Back arrow to return to the main screen of the Profile Manager.

9.We have our new profile. Let's set it as the default profile TIFF Image Printer uses when printing. To do this, open the TIFF Image Printer Dashboard again and select "Manage Printers" to open Printer Management.

OpenPrinterManagement

 

10.Next to TIFF Image Printer, use the drop box to set your new OCR profile as the default profile. Here, we created the new profile OCR Color Optimized TIFF and we will select that profile. This profile creates multipaged TIFF images. We will use OCR to scan the pages and save the text as a separate text file along with the TIFF image.

OCRTIFF-ChooseOCRProfile

 

11.Select the Save icon to save your changes to the printer settings.

OCRTIFF-SaveProfileChange

 

12.Close Printer Management and Dashboard.

OCRTIFF-ExitPrintManagement

 

13.Open the document you want to convert to TIFF. This step will also save the extracted text. Here, we have opened a scanned PDF in Adobe Reader. Each page in our document is an image, and we want to recognize and save the text in this file as we create the TIFF images. You can tell if a page is a scanned image as you cannot select any text on the page, only an area of the page, as shown below.

OpenScannedPDFDocument

 

14.Select File - Print from your application, and select TIFF Image Printer 12 from the list of printers.

PrintScannedPDF

 

15.Printing your document will prompt you to choose the name and location of your new TIFF image and OCR text file.

OCRTIFF-SaveAsOCRColorOptimizedTIFF

 

Navigate to the folder where you wish to store your TIFF image and OCR text file. Your Documents folder will be selected for you by default.

In the File name field, enter a name for your TIFF image. A default name for your file has been filled in based on the file name of the document you printed to TIFF Image Printer.

In the Save as type field, the profile OCR Color Optimized TIFF is already selected for you. Changing the profile option selected here will change the type of file created.

Click Save to create your new TIFF image and OCR text file in the chosen folder with the name specified.

 

16.When you open your new TIFF image and the OCR text file, you can see the text file contains the text from the scanned document.

OCRTIFF_TIFFResults