Please enable JavaScript to view this site.

PDF Image Printer

Navigation: Profile Manager

Text Extraction

Scroll Prev Top Next More

The Text Extraction tab will create a separate text file containing all of the textual elements of your printed document. These text files are often paired with the files when stored in archival systems to allow searching and retrieval of the files using textual data.

The text extraction feature was not designed to be Optical Character Recognition (OCR) software and does not extract text from images. It only recognizes and extracts straight text from printed documents and saves it to a text file. Formatting of the text file may not be exact. To OCR scanned pages and images, see the OCR settings tab.

 

ProfileManagerTextExtractionTab-PDF

Enable create a text extraction file with each output file

By default this setting is disabled. Enabling this setting means PDF Image Printer will extract text and save the created text file in the same directory and with the same name as the output file. Note that text extraction only works for printed documents; it cannot extract text from images. To extract text from images use our OCR feature instead of text extraction..

 

If you have text output enabled for OCR, enabling this option will display an error. To perform both text extraction and OCR to Text output, you will need to change the default extension of .txt used in the OCR options to .text or similar.

Text Layout

Layout

Choose the layout for the text in your file.

Physical - Attempts to match the format of the text in the original file.

Raw - Saves the text in the order in which it is was sent to the driver. This may not be the same order as the text in the original file.

None - No formatting is attempted. All text is written to the file in the order in which it is received from the printing application.

Text Encoding

Format

Choose the encoding format for your text file.

UTF-16 - uses 16-bit Unicode encoding

UTF-8 - uses 8-bit Unicode encoding

ANSI - uses the current ANSI code page

End-of-Line

Choose the end-of-line encoding for your text file. Depending on the operating system the text file will be used on, you may need to choose the appropriate line return code.

Windows - lines end with the carriage return line feed (CRLF, \r\n) used by Windows

Mac - lines end with the carriage return (CR, \r) used by Macintosh

Unix - lines end with the line feed (LF, \n) used by UNIX.

Emit page breaks between each page

By default this setting is enabled. Enabling this setting means PDF Image Printer will insert a page break, or form feed (\f) in your text file for every page in your original document.

See Also: Extracting Text From the Created File