When converting insurance forms, medical records, invoices, and other types of documents to image formats for digital archiving or document discovery systems, the images are often paired with a text file containing the textual elements of the printed document. This editable text file is used to search for and retrieve documents from the storage system.
The Text Extraction feature in the TIFF Image Printer, Raster Image Printer and PDF Image Printer creates a separate text file containing all of the text that can be extracted from the printed document. The text files are stored in the same location and with the same base name as the output files you are creating. When creating multi-paged output such as TIFF or PDF, a single text file containing all the text for all the pages in the file is created. With serialized output such as JPG files, a single text file is created for each page.
Text extraction works by taking all the text it can from the information sent to printer when the original document is printed and saving this into the text file. The format of the text in file is approximate and may not match your original document. This feature is not the same as Optical Character Recognition (OCR). If the document is a scanned image, or contains no textual content, there is no text to extract and save.
Step by Step Instructions
All text extraction settings for text layout, encoding and page breaks are controlled in the profile.
- Launch the your Image Printer Dashboard.
- Select “Edit & Create Profiles” to open Profile Manager.
- Select “Add a profile” to create a personal profile, or create a copy of one of our system profiles.
- Name the profile, add a description, and click Save.
- On the Text Extractions tab, enable Create a text extraction file with each output file.
- Under Text Layout, select how to lay out the text in the file. The default layout, Physical, will try and match the format of the text in the original file as much as possible. Raw saves the text in the order that it is sent to the printer, which may not match the original file, and None will save the text content with no formatting applied.
- Under Text Encoding, select your desired text file encoding format.
- Enabled by default, Emit page breaks between each page inserts a page break (form feed) in the text file for every page in the original document.
- Click Save-Back, and close Profile Manager.
If you plan to use these settings regularly, you may wish to make this personal profile the default profile used by your image printer.
- Select “Manage Printers” to open Printer Management.
- Select the printer you wish to edit and use the Profile drop box to select your desired default profile.
- Select the Save icon to save changes.
- Select the Home icon to return to the Dashboard.
- Close the Dashboard. Now when you print any document containing text to your image printer, a searchable text file will also be created with the textual contents of the original file.