Optical Character Recognition (OCR) with PEERNET

New feature optical character recognition to create searchable pdf and extract text from images.

We here at PEERNET are thrilled to announce the addition of Optical Character Recognition (OCR) to our family of virtual printer drivers: TIFF Image Printer, Raster Image Printer, and PDF Image Printer.

Say hello to dynamic, searchable PDF files and editable text files from images.

Images traditionally presented a challenge when you needed to extract the text. With optical character recognition, our printers can transform images into searchable documents. Your printed and scanned documents can effortlessly become searchable PDF files or editable hOCR, text, and ALTO files you can search, organize, and integrate into your business workflow.

What is Optical Character Recognition (OCR)?

First, what is Optical Character Recognition, or OCR? Also, what exactly happens when you OCR a PDF or an image?

OCR examines the image using pattern recognition and artificial intelligence. It looks for patterns, shapes, and spacing that resemble text characters, lines, and paragraphs. It matches the shapes against predefined patterns for each language chosen to recognize. Each character match is assigned a confidence level or score. The highest score determines which character in which language matches.

As part of the recognition process, optical character recognition also identifies and stores information about the layout and formatting of the text on the image.

Using this information, we can create searchable PDF files. Or, if we are creating TIFF, JPEG, or other images, we can bundle the text and layout information with the corresponding image for document management systems to make images searchable.

Other uses for this information are in archives to convert printed material into a digital format, by researchers for data mining and text analysis, and by accessibility tools.

Searchable PDF Files Using Optical Character Recognition

A searchable PDF file is one where you can search for and locate text on the page. You can select the text on the page and, if allowed, copy and paste it into other documents.

Scanned PDF documents are often just images of each page wrapped as a PDF file. You cannot search or copy the text in these files. The easiest way to tell if a page is a scanned image is to try and select the text. If it is an image, you cannot select any text on the page, only an area of the page, as shown here.

When you print a scanned PDF file to our printers, OCR will add the text on the pages as an invisible layer you can search and copy.

Optical character recognition is not just for PDF files. Print and append images together to convert images into a searchable PDF. You can make the images in a PDF searchable, just like the text. Printing a PDF containing text and pictures creates a new PDF where the original text and text in pictures are now searchable.

Do you already have a scanned PDF you want to make searchable? Follow our quick steps in Convert PDF to Searchable PDF with OCR to see how easy it is.

Optical Character Recognition Can Extract Text From Images

When creating TIFF, PNG, JPEG, and other images using the TIFF Image Printer and Raster Image Printer, you use the optical character recognition feature to extract the text to digital hOCR, text, and ALTO files. There is no way to embed the text information into an image the way you can with a PDF.

hOCR is a specially formatted XHTML file containing the text extracted from the page. It stores format and layout information and a score for how confident the OCR engine is on its match.

An ALTO file is similar to hOCR but stores the information using a different structure and specification.

Both formats contain human-readable text extracted from the image and work with different OCR tools and applications.

A text file, however, contains only the text extracted from the image. It does not attempt to mimic the layout of the text on the image. This file contains only plain, human-readable, and editable text that you can edit in any text editor.

This format is perfect when you need to copy the text from the image into another document. You can also use it to perform text recognition on the image as part of a document archiving step.

Our tutorial, Create TIFF and Extract Text From Images Using OCR, shows how easy it is to extract text from images in a single step.

Recognizing Different Languages

Our optical character recognition uses one or more language datasets to match the shapes and curves on the page to text characters.

We recognize and match shapes against the English language dataset to start. The installation also includes the French, Italian, German, Spanish, Arabic, Hebrew, and Hindu language datasets.

You control which languages to look for when performing optical character recognition. The best practice is to check only the languages you know are in your document. The more languages to test, the longer the process takes.

Need other languages? No problem. There are over 100 language datasets you can download and add. You’re sure to find the language you need.

Ready to Try It?

Adding optical character recognition to our virtual printers marks a significant milestone in our commitment to providing solutions to our users. We’re excited to be able to offer this feature to you. We hope you will join us as we enter into the world of OCR and explore the world of possibilities it unlocks.

Download a trial of PDF Image Printer, Raster Image Printer, or TIFF Image Printer and try it out today! Already own one of our image printers? Log in to your online account to download the latest version and see what OCR can do.

Convert PDF to Searchable PDF with OCR

Did you know you can convert PDF to searchable PDF just by printing? Convert any scanned PDF document or image into a searchable PDF file using OCR and our PDF Image Printer or Raster Image Printer software.

What is OCR?

OCR stands for Optical Character Recognition. This process searches for and recognizes text or characters on scanned pages or images and extracts them as digital text. OCR works by analyzing the patterns, shapes, and curves of the text characters on the page and matching them to predefined information for different characters in each language. 

When you convert PDF to searchable PDF, the digital text found by the OCR engine gets added as an invisible text layer to each page in the new PDF file, making the file’s content searchable. 

The same thing happens when you convert an image to a searchable PDF. The new PDF contains the original image and an invisible layer of text. It is this layer of text that makes the PDF searchable.

Set up the Printer to Create Searchable PDFs using OCR

We’re using the PDF Image Printer below, but the steps are the same for Raster Image Printer. To start, open the Dashboard by double-clicking on the desktop shortcut for your printer.

Launch PDF Image Printer Dashboard

The Dashboard gives you access to license information, profiles, printers, tools, and resources. Select the Manage Printers tile to open Printer Management.

Open the Printer Management screen.

The Printer Management screen lists all copies of your PDF Image Printer and which profile to use when creating files using that printer.

Next to the printer name, use the printer profile drop box to set the profile to Adobe PDF with OCR. This profile recognizes English text and creates searchable PDF files. If any incoming pages are images, we will use OCR to scan the page and add the invisible text layer to make the page searchable.

Set the printer profile to convert to searchable PDF.

Select the Save icon to save changes.

Save the changes to the printer settings

Close Printer Management and then close the Dashboard.

Close the Dashboard.

Convert a Scanned PDF to a Searchable PDF

Next, open the document you want to convert to a searchable PDF. You need to be able to open and print your original file to create a searchable PDF from it. For example, if you have a scanned PDF file, you need Adobe Reader or another PDF viewer to print the file. For images, any photo application that can print will work.

Here, we have opened a scanned PDF in Adobe Reader. Each page in our document is an image, and we want to be able to search the text in this file. You can tell if a page is a scanned image as you cannot select any text on the page, only an area of the page, as shown below.

Open a scanned PDF or a PDF with images.

Select File – Print from your application, and select PDF Image Printer 12 from the list of printers. Then click Print to send the document to the printer.

Print the PDF to PDF Image Printer

Printing your document will prompt you to choose the name and location of your new searchable PDF file. Leave the profile Adobe PDF with OCR selected in the Save as type field, then click Save to create your new searchable PDF file in the chosen folder with the name specified.

Choose the name and location of your new searchable PDF file.

When you open your new PDF file, you can now select the text on the page. That’s it! That’s how simple it is to create a searchable PDF file from images or scanned documents.

Text can be searched and selected in the new PDF File.

Create TIFF and Extract Text From Images Using OCR

You can use TIFF Image Printer and Raster Image Printer to effortlessly extract text from images by printing your images or scanned PDF documents. With just one step, you can create TIFF images and extract the text from pages into an editable text file, making it easy to modify the content as needed.

If you have a scanned PDF document and need to create searchable PDF files, see Convert PDF to Searchable PDF with OCR instead.

What is OCR?

OCR (Optical Character Recognition) searches for and recognizes text (characters) on scanned pages or images and extracts it as digital text. Outside factors such as image quality, the font used, and any image background on the pages will all affect the quality of the OCR results.

You can save the text output from the OCR process as hOCRText, or ALTO files. From the OCR settings, you can choose which type of extracted text file to create from the options in the OCR tab and even generate all of them at once if you want.

We’re using the TIFF Image Printer below to extract text from images, but the steps are the same for the Raster Image Printer. For Raster Image Printer, this works for all output images, TIFF, PNG, JPEG, etc.

Create the Extract Text From Images Profile

To start, open the Dashboard by double-clicking on the desktop shortcut for your printer.

Launch TIFF Image Printer Dashboard

The Dashboard gives you access to license information, printers, and resources, but most importantly, creating, copying, and editing profiles.

Select Edit & Create Profiles to open the Profile Manager to create a new profile.

Open the Profile Manager to create a new profile.

Find the system profile named Color Optimized TIFF. Create a copy of it using the copy icon in the lower left. The same steps we are doing here apply to any system profile. You can also create custom profiles through the Add a profile button.

Copy of an existing profile to create our text extraction profile.

Configure OCR For The Extract Text From Images Profile

Give your new profile a name and a description. Next, go to the OCR tab and turn on OCR (Optical Character Recognition). Running OCR on each page can be a time-consuming step. For this reason, it is disabled to start.

Enable OCR for the new profile so we can extract text from images.

Next, choose which OCR text files to create. There are three to choose from, and you can select to create more than one type.

  • hOCR is an XHTML file containing the text extracted from the page. It also stores format and layout information and a score for how confident the OCR engine is on its match.
  • Text creates a UTF-8 text file containing only the extracted text.
  • ALTO is similar to hOCR but stores the information as XML following the Analyzed Layout and Text Object specification

For our example, we chose Text OCR. We only want to extract the text from the page and don’t care about the layout or positioning on the page.

Select what type of OCR text file to create.

Lastly, choose which languages to look for on the page. You must select at least one language. The more languages to match against, the longer the OCR process will take. If you have documents with mixed languages, select all languages used.

PEERNET Image Printers can recognize Arabic, English, French, German, Hebrew, Hindi, Italian, and Spanish, with additional languages available to download.

Choose which languages the OCR process will recognize.

Saving and Using the New Printer Profile

With your OCR settings configured, you can now Save the changes to your new profile. Click the Back arrow to return to the main screen of the Profile Manager and then close it.

We have our new profile. Let’s set it as the default profile TIFF Image Printer uses when printing. To do this, return to the TIFF Image Printer Dashboard and select Manage Printers to open Printer Management.

Open the Printer Management screen.

The Printer Management screen lists all copies of your TIFF Image Printer and which profile to use when creating files using that printer.

Next to the printer name, use the drop box to set your new OCR profile as the default profile. Here, we created the new profile OCR Color Optimized TIFF and we will select that profile. This profile creates multipaged TIFF images. We will use OCR to scan the pages and save the text as a separate text file along with the TIFF image.

Set the printer to use our new extract text from images profile.

Select the Save icon to save your changes to the printer settings.

Save the changes to the printer settings.

Close Printer Management and the Dashboard.

Close the Dashboard.

Convert Scanned PDF to TIFF and Extract Text From Images

Open the document you want to convert to TIFF and extract text from images into an editable file. Here, we opened a scanned PDF in Adobe Reader. You can do the same with TIFF, PNG, and other image files.

Each page in our document is an image. We want to recognize and save the text in this file as we create the TIFF images. You can tell if a page is a scanned image as you cannot select any text on the page, only an area of the page, as shown below.

Open a scanned PDF or an image you want to extract the text from.

Select File – Print from your application, and select TIFF Image Printer 12 from the list of printers. Then click Print to send the document to the printer.

Print the PDF or image to TIFF Image Printer.

Printing your document will prompt you to choose the name and location of your new TIFF image and OCR text file. The OCR process saves the extracted text files with the same base name and location as the new image.

Leave the profile OCR Color Optimized TIFF selected in the Save as type field.

Click Save to create your TIFF image and OCR text file.

Choose the name and location of your TIFF and OCR extracted text file.

And we are done. That is all there is to extract text from images using the PEERNET Image Printers. Looking at our new TIFF image and OCR text file, we can see that the text file contains the extracted text from the image.

TIFF Image is saved with a matching text file with the extracted text.
convert word to jpeg

Convert Word to JPEG with Raster Image Printer

The need to convert Microsoft Word to JPEG is not something you would see every day. However, there are occasions when you may find you need to do so. You may need to share your Microsoft Word document with others but are concerned about security and protecting the content. What do you do?

You use a word converter to create JPEG images of your document pages. As an image, the reader cannot edit or modify the contents, and it is challenging, to say the least, to extract the text from the page.

Advantages of JPEG Images

The primary advantage of a JPEG image is compatibility. They are viewable almost everywhere without downloading special software, including Windows and MAC PCs, UNIX computers, and mobile devices like tablets and phones.

Additionally, when you convert your Microsoft Word document to JPEG, an added benefit can be smaller file sizes. Smaller files are easier and faster to store, upload, and email.

How Can I Convert Word to JPEG

Search Google for ways to convert Word to JPEG and you’ll see screen after screen of online converters. They have their uses but not when you have secure content that you can’t upload all willy-nilly.

Other articles involve copying and pasting text in and out of different applications or converting to an intermediate format first. This approach is time-consuming. What if I told you you can do this all in one step?

This is where our Word document to jpeg printer, Raster Image Printer, comes into play. As easy as printing, you can convert your Word files to JPEG images in a single step. And it’s secure too! You’re printing your document on your computer, keeping everything local and offline.

Watch our handy video tutorial below to see how easy it is to convert Word to JPEG, or follow along with our quick step-by-step instructions below.

Show Me How to Convert Word to JPEG

After installing you will see the Raster Image Printer Dashboard icon on your desktop. The dashboard is your one-stop shop for help resources and video tutorials, creating profiles for the printer, and licensing information.

The desktop shortcut to the Raster Image Printer Dashboard links you to help and resources

Open the Microsoft Word document you want to convert to JPEG. To create JPEG images we first need to print the Word document to the Raster Image Printer. To do so, select File, then Print from the application menu.

To convert you Word document to JPEG, go to File, then choose Print

Choose Raster Image Printer 12 from the list of printers and then click the Print button.

On the print screen select Raster Image Printer, then click print to convert word to jpg

Save File dialog pops up so you can choose where to save your JPEG images. You can also change the file name for the JPEG images if you want.

Choose where to save your new JPEG images

The next step is to choose what type of file you want to create. Here, we want to convert our Word document to JPEG so we will select Color Optimized JPEG from the Save as type drop-down list.

Select Color Optimized JPEG to create a jpeg from your Word document

Lastly, click Save to create your JPEG images.

Click save and the Raster Image Printer word converter will create your JPEG file.

Convert Word to PDF, TIFF, and Other Images

You can do more with Raster Image Printer than just creating JPEG images. You can also create PDF files, and other image formats like TIFFPNG, BMP, and GIF.

As an added bonus, it is a printer, and as a printer, it works from any application that can print. That means not only can you create PDFs and images from Word, but also Excel, PowerPoint, Adobe Reader, AutoCAD, the list goes on. That’s a lot of bang for your buck.

You may also be interested in:

Password Protect PDF With PDF Creator Plus

Protecting your PDF file with a password lets you control who can read, print, edit, or copy the contents of your file.

Adding passwords is often mandatory in some industries. Secure PDF files are commonplace in many industries, including healthcare, finance, and legal proceedings.

When sharing sensitive information with a select group, adding a password ensures only those with access can view the file.

PDF Password Protection and Permissions

There are two levels of password protection in a PDF file – Open password and Permissions password

The Open password is the first gatekeeper. You need to enter this password to open and view the PDF file. Permission options set in the PDF are enforced when opened using this password. Permission settings for PDF documents include options for printing, copying, commenting, filling in forms, and adding or removing pages.

The Permissions password, or change password, controls access to the permission settings within the PDF. Opening the PDF file with this password allows you to modify and add new permissions.

It is good practice to use strong and unique passwords and to keep a secure record of them. Losing the password can result in loss of access to your PDF.

PDF Security and Encryption

Adding password protection to your PDF with PDF Creator Plus also encrypts your file contents. Encryption uses your password and a complex algorithm to transform the contents into an unreadable format. Only users with access to the password can decrypt the contents back to a readable form.

Building PDF Files in PDF Creator Plus

Creating a new password-protected PDF or adding a password to an existing one is simple with PDF Creator Plus.

If you have an existing PDF, create a new PDF Creator Plus project by right-clicking the PDF file and selecting Open With, then choosing PDF Creator Plus. This action adds the pages in the PDF file to the project.

If this PDF file is password-protected, you need to know the password to open the document and add the pages to the project.

Quickly open  PDF files from the File Explorer menu.

What if I need a PDF from a Word file, Presentation, or other document, you ask? That’s easy – just print your document to the PDF Creator Plus 8 printer. This action also creates a PDF Creator Plus project populated with the pages you printed.

Print any document to create a PDF.

It is worth it to note here that you can also build PDF documents from multiple files to create your final PDF file. Print your first document or open a PDF to start your page collection. Then, with the PDF Creator Plus project open, continue printing documents, or adding PDF files, to add new pages to the project.

Once the pages are collected, the next step is to create your new PDF and to add, remove, or change the password and permissions as needed.

Password Protection Features in PDF Creator Plus

When you have collected the pages you want, it is time to create your secure PDF. Remember, your pages can be from a single PDF, a document you printed, or a collection of pages gathered from printing and adding PDF files directly to the project.

Select the Create PDF button at the top of the page preview window to open the Create File dialog. From here, multiple options for creating PDF files are available to you. First, select the Use Security checkbox to turn on the security features, then click Save to display the Security Options window.

Enable PDF security to choose permissions and set password protection.

The Security Options dialog allows you to password protect your PDF and manage PDF permissions. Here, we restricted the PDF to viewing and printing and required a password to open it. Viewers cannot copy text or images, or change the content in the PDF.

We also set the password needed to change the PDF permissions. Readers who know and use this password when opening the PDF can modify the permission settings and remove any security in the PDF.

Password protect your PDF with encryption, permissions and strong passwords.

After selecting your security options, click the OK button to create your new, secure PDF file.

Changing or Remove an Existing PDF Password

Changing or removing a password from an existing PDF file follows the same steps as adding a password. The only caveat is that you need to know the password to open the PDF.

As above, right-click and open your password-protected PDF in PDF Creator Plus using Open With – PDF Creator Plus. PDF Creator Plus will prompt you to enter the password before opening the PDF file and adding the pages.

To open a password protected PDF you need to know the open or change password.

From here, the steps are the same again. Select the Create PDF button at the top to open the Create File dialog. Check the Use Security checkbox, then click Save to bring up the Security Options window.

Here is where you change the security opens for the new PDF file. To change the open or permission password, select “Password required to open document” and enter the new password. If you want to remove all password protection, uncheck both password options at the bottom.

You can also modify the permissions for the PDF here. For instance, below we added permission to copy text and images and to change the document.

If you know the password to open the PDF, collect the pages and change or remove the password and permissions

Lastly, click OK to create your new PDF file with the updated or removed passwords.

Creating Unsecure PDF Files

To completely remove any passwords, permissions, and PDF encryption in your PDF file, uncheck Use security when creating your PDF file.

Create PDF files that are not password protected by turning off security.

Give it a try!

With PDF Creator Plus 8, it is easy to create password-protected PDF files and to remove passwords from PDF files. You also can choose what your readers can do with your PDF file and its content.

While password-protecting a PDF file offers added security, it is not foolproof. Choose your passwords carefully and keep them secure. Use strong and unique passwords, as weak ones are easy to crack.

See how easy it is for yourself – download the trial and start creating password-protected PDF files.

Reducing PDF File Size in PDF Creator Plus

If you’re researching how to reduce PDF file size, you’ve probably already encountered the dreaded “file size exceeded” message. Or perhaps your company has PDF size guidelines that you need to follow. PDF files can get surprisingly large for various reasons. You can quickly find yourself with a file too large to email, upload, or send through messaging apps. Let’s add a few tips and tricks to our arsenal to prevent this from happening.

How Do I Find My PDF File Size?

First, how do we find out how big our PDF file is? To do so, find and select the file in Windows Explorer. This makes the Properties button appear on the Home tab. Click this button to bring up information about the PDF file, such as file size and other properties.

find pdf file size in Windows

What Affects PDF File Size

The next step is determining what is causing your large PDF file size. Once you know this, you can use the tips below to help reduce the PDF file size. Some of the common reasons are the following:

  • Images in the PDF file
  • PDF is a scanned document
  • Image Compression
  • Downsampling
  • Embedded Fonts

Images

When looking to reduce PDF file size, first see if there are images in your file. Photos, graphics such as charts and graphs, and logos are a few examples of what is inserted as images.

Are these images color? Black and white? Greyscale? Both the type of images stored in the PDF file and the resolution, or quality, of the images impact the file size. PDF Creator Plus has tools and options that adjust the image content in your file to help reduce the PDF file size.

Scanned Documents

When a PDF file is created by scanning a document, each page becomes an image. Again, how this image is stored plays a part in how large or small the PDF file is. The same tools and options for image content apply when reducing PDF size for scanned documents.

Image Compression

Images in PDF files are compressed using different compression methods to save space. Each compression method uses its own rules to reduce the size of the image while trying to maintain as much visual quality as possible. Different types of images will compress better when using a particular compression method. When your document contains photos, then JPEG compression is your go-to. With diagrams or charts with solid colors, try LZW (Lempel-Ziv-Welch). Documents with gradient color backgrounds, such as brochures, can work best using ZIP (Deflate). Lastly, black and white images give the best results with CCITT Group 4.

Image Downsampling

An image is created from a grid of many tiny squares called pixels. A larger grid allows for more pixels and creates a more detailed image. It also takes more space to store. Downsampling an image refers to reducing the number of pixels by replacing every few squares in the grid with one larger square by combining the colors. This decreases the size of the image but can also make it appear blocky when reduced too far.

Embedded Fonts

What is a font? It is a collection of information that describes the style and size of your letters, numbers, and symbols. For example, Arial is a font, as is Calibri, Verdana, Helvetica, and many more.

With PDF files, it is common to embed a copy of the font information for all fonts used in the document. This ensures that the PDF file will look and print the same everywhere. The downside is that the size of the collected font information can significantly add to your PDF file size. Keep this in mind when creating documents destined to be PDF files. The fonts and how many different ones you use all affect the size of the PDF.

PDF File Size vs. Quality

This is the ultimate balance point. We all want a file that is small and looks good. Downsampling and compressing the images in your PDF file helps create smaller PDF files. There can be too much of a good thing, however. Downsampling too low can cause blocky images. Choosing a compression method that creates the smallest file doesn’t guarantee that the images in your PDF file will still look acceptable.

When creating your PDF Files, experiment with both compression methods and downsampling options to find a balance that works for you.

What if the File Size Gets Larger?

This can happen! Remember, previously, we explained that different compression methods work best when compressing certain types of images. The opposite is also true. Using a compression method unsuited to the images in your PDF files can blow up the file size. An example would be using LZW in a PDF file with colorful photos; the resulting PDF would be much bigger than when using JPEG on the same file.

How to Reduce PDF File Size in PDF Creator Plus

Initially, PDF Creator Plus will use image compression and downsampling options tailored to create a PDF file optimized for the best quality. This means the PDF file will appear top-notch but might not be the most compact in terms of size.

This is a good option when you need PDF files suitable for printing or use in a presentation. This is also a good option when you need to maintain high-resolution images from the source document.

The best quality option creates a PDF file that favors appearance over size.

Use Custom Settings for Reducing PDF File Size

However, when you have limitations on the size of the PDF file, you may find you can customize the image compression and downsampling options to create a smaller PDF file that is still acceptable in terms of quality and readability.

To use custom settings when creating your PDF file, you first need to change either compression, downsampling, or both. You can find these settings under the Edit – Applications Preferences menu item.

Check your source document’s images, then adjust the compression methods to match their style and color. Here, we changed the compression for Indexed and Greyscale images from LZW to JPEG. JPEG compression offers excellent size reduction and maintains the image quality when compressing photo-like images. The downsampling options have not been changed.

Reducing PDF file size can be done using compression and downsampling options

Next, when creating our PDF file, choose User Settings instead of Best Quality to use your new compression and downsampling changes when creating the PDF.

Turn off Font Embedding

One of the last places to reduce your PDF file size is with font embedding. We don’t often recommend this. When the fonts are not embedded, PDF viewers like Adobe® Reader will attempt to match the fonts as closely as possible. This can affect document layout and appearance. If your document uses readily accessible fonts, you can save space by not embedding font information.

Not embedding fonts can reduce PDF file size but is not recommended

What If the File Size Doesn’t Change?

Not getting the size you want? Tried everything you can think of? No problem. Send a quick message to our friendly PEERNET Support Team – they’re always happy to help. They can assist you in reducing PDF file sizes and clarify why some files might already be at their minimum size.

Convert EML with Attachments to PDF or TIFF

New features for DCS - convert EML and extract and convert attachments from EML

We have exciting news to share with you! Our latest update for Document Conversion Service adds the ability to convert EML files with attachments. What does this mean for you? Well, it means you can now convert Windows Live Mail, Windows Mail, Outlook Express, and Gmail messages and attachments to different formats effortlessly. Want to convert an EML to PDF? No problem! Want to convert EML to TIFF instead? We’ve got you covered for that too.

What is an EML Message?

Emails are a crucial part of our daily lives. We use them for work, personal communication, staying connected with loved ones, and everything in between. Sometimes we need to go beyond the limitations of our email clients and use that valuable content in other ways. That’s where our software’s EML conversion feature comes in handy.

We use EML files, or messages, for email archiving, storage, and transfer. They contain a wealth of valuable information, including text, attachments, and metadata. However, their native format does not always make it easy to add email content to other applications. Additionally, extracting any attachments for further processing can be a painful manual process. With our software’s EML conversion feature, you can now overcome these limitations.

Convert EML to PDF and TIFF

In addition to new image formats for this release, we have expanded our software’s capabilities to include the conversion of EML files. EML (Email Message) files are commonly used for email archiving and storage purposes. With our software, you can now effortlessly convert EML files into various formats, unlocking new possibilities for organizing, sharing, and repurposing your email content.

Convert EML With Attachments

Are you a legal professional or forensic expert looking to transform EML to review and manage electronic evidence? EML files are like containers that hold all the important stuff in our emails—the message text, attachments such as pictures and documents, and other vital details.

With our software’s EML conversion feature, you can now convert these files into various formats that suit your needs. Whether it’s turning an email into a PDF file for easy sharing, or converting it and its attachments into PDF files for further use, our software has got your back. With our software’s EML conversion feature, these tasks become a breeze, even if you’re not a tech wizard.

Using Watch Folder Service to Convert EML with Attachments

The Watch Folder Service is a convenient feature of the Document Conversion Service, which allows you to use a drop or hot folder to easily convert your files.

Our Watch Folder service is capable of monitoring multiple folders and converting any files dropped into them into PDF files or various image formats, including TIFF and JPEG. You have the freedom to monitor as many folders as you wish. Each folder can have its own unique settings tailored to your needs. We provide several sample watch folder configurations to start you off.

When converting and extracting EML files, the service will process the original email and all attached files. This includes any embedded images in the email and signature. If an attachment is an EML or Outlook Message (MSG) file we process and extract it as well.

After conversion, the message contents are saved in a folder with the original EML file name. Attachments with identical names are automatically numbered to avoid file name collisions.

The sample message below, Test Email With Attachments.eml, contains a single PDF file, lorem.pdf, as well as an image in the signature. Here, the EML and the two attachments, a PDF file and a PNG image, are converted to TIFF images and placed in a folder named Test Email With Attachments.eml.

Image showing converting an eml mail message with attachments to tiff images

Editing a Watch Folder

To add a watch folder or edit its settings, use the Configure Watch Folder Settings app.

Type “Configure Watch Folder” into the taskbar search field to quickly find it.

use the search bar to quickly find and edit the Watch folder settings

Open the Configure Watch Folder Settings app that shows up in the search results.

Enabling Message Extraction in a Watch Folder

Each watch folder definition has a separate section in the configuration file. A watch folder section starts with the text <WatchFolder>, with individual settings and options for that folder following, and ends with </WatchFolder> on a single line.

There are multiple watch folder sections, one for each folder you want to watch.

Each watch folder section contains settings for converting EML with attachments. Both EML and MSG use the same settings collection.

Look for the setting named PreprocessArchiveFormatsFilter. Commenting out this setting, or setting it as an empty string will turn off EML and MSG extraction. Remove the comment markers on the line to enable extraction. The default setting will extract both EML and MSG files. To only extract one or the other, change the setting value to “*.msg” or “*.eml“.

Each folder has their own individual settings, convert EML with attachments is one of them.

Filtering What Files to Extract

You can choose which attachments to process and skip based on file extension. Settings are available for file inclusion and exclusion. To extract and convert specific file types, use the file extensions for each type separated by the pipe (|) character. The include filter is applied first, followed by the exclude filter.

To include all attachments and exclude none, set both settings to an empty string.

Usually, only one of the filters needs to be set at a time: PreprocessArchive.MSG.AttachmentsIncludeFilter or PreprocessArchive.MSG.AttachmentsExcludeFilter. This will depend on how you need to filter. It is easier to edit the filter to exclude only “.jpg” attachments or include only “.pdf” attachments than to write long, specific lists of all of the file types.

When converting EML with attachments, you can include or exclude  file types by extension

As an example of filtering, if you only want the original EML message and any attached Word and PDF documents, set the include filter to “*.doc|*.docx|*.pdf” and leave the exclude filter as an empty string.

Image Attachment Options

Email messages often have signatures that include images such as company logos or social icons. These images are often relatively small in size when embedded in the email but can become very large when saved as an image using the output type and resolution set for the watched folder.

When including images in your message processing, the following settings help keep the file size of the processed images to a minimum.

Keep the Original Image Resolution

The first way is to keep the converted images at the same resolution as the original. This helps keep lower-resolution images from exploding in size when upscaling them to the resolution set in the watch folder settings image settings.

To do this, the EML and MSG setting PrePreprocessArchive.MSG.ImageAttachmentsKeepSourceResolution defaults to true. This forces the original image resolution when converting to the new format.

This means that if the watch folder creates 300 dpi TIFF images and the email contains a 96dpi JPEG image, the TIFF image we create from the JPEG image will also be 96dpi, and not 300dpi. This results in a much smaller image.

If you want to upscale all images from your email messages to the output image resolution set in the watch folder, change this setting to false.

Keeping the original image resolution can help keep your file size smaller.

Image Compression Settings

The compression settings of the output format also affect the final output file size. Below, we are using “High quality JPEG” compression when creating color, indexed, and greyscale TIFF to get a lower file size.

Remember, this setting applies to all documents and images from EML files converted through this watch folder. Applying JPEG compression to text-based documents like email messages can cause slight blurring of text. Always test your settings for file size and document clarity and readability. Depending on the style and type of documents, you might need to adjust the compression settings.

Compression options for the output file type also help reduce file size when converting EML with attachments.

Saving and Using Watch Folder Profile

Once you have made any necessary adjustments, remember to save the configuration file for the Watch Folder service.

To pick up your new settings, you need to Stop and Start the Watch Folder Service first. After the restart, when you drop EML files with attachments into the Input folder for the watch folder you edited, both the EML message and attachments will be converted.

HEIC, WEBp, and AVIF Conversion – New Features for DCS

New features for DCS include HEIC WEBp AVIF Conversion

We are happy to announce our latest release of Document Conversion Service. With this update, our software now supports HEIC, WebP, and AVIF conversion, as well as many other new image formats. Our dedicated team of developers has been hard at work, listening to your feedback and implementing the most requested enhancements. These new features are part of the Document Conversion Service and do not need any extra third-party applications installed.

HEIC, WebP, and AVIF Conversion

HEIC, WebP, and AVIF images have become popular in recent years. Their small file size and impressive visual quality have driven their popularity. However, their compatibility with various devices and software applications may sometimes pose challenges. Our new update removes these barriers by empowering you to convert these popular file formats into more widely supported alternatives.

Converting to PDF

The conversion of HEIC, WebP, and AVIF images to PDF files opens up a world of possibilities. PDF files are widely used for document management, archiving, and file sharing. PDF (Portable Document Format) is a versatile file format. It retains the integrity of your images while offering many additional benefits. Compatibility with multiple devices, secure data encryption, and a standardized layout that preserves your visual content exactly as intended are just a few of them.

Converting to TIFF

The ability to convert HEIC, WebP, and AVIF images to TIFF brings added flexibility and compatibility to your workflow. TIFF (Tagged Image File Format) is known for its lossless compression and high-quality image representation. It is the preferred choice in industries like printing, publishing, and graphic design, with TIFF images you can seamlessly integrate your images into professional projects without loss of detail or color accuracy.

Updated Built-In Image Conversion

In today’s fast-paced digital world, the way we capture, edit, and share images has evolved significantly. As technology advances, so do our expectations for higher-quality visuals and more efficient file formats.

Our developers have enhanced the image conversion capabilities from the ground up for this release. The focus was on new image formats, conversion speed, and image fidelity. In addition to HEIC, WebP, and AVIF, support for many raw camera formats and image formats popular on Linux, MAC, and other platforms have been added.

We understand the importance of efficiency in your workflow where every second counts. That’s why we’ve made significant improvements to the speed of image conversion when rotating and manipulating images. We’ve also improved our detection of portrait and landscape orientation for images.

After speed, file size is always the next concern. Converting images from one format to another, and from one resolution to another can affect file size. For resolution-independent images, this sometimes led to very large files on disk. This latest update recognizes these images and ensures that they are created at the proper file size.

Until Next Time!

That’s all the news for now. We’re always happy to get feedback and suggestions for current and additional features so let us know what you think. We have many new features planned so stay tuned for more updates!

pdf-raster-imagerprinter-feature-image

Ways to Reduce PDF File Size

Reduce your PDF file size for effortless sharing, downloading, emailing, and mobile viewing. Smaller PDF files will download faster and more readily meet limits for email attachments and mobile device capabilities.

If you are creating non-searchable PDF files where each page is an image instead, jump over to Reduce Non-Searchable PDF File Size for information on reducing raster PDF file size.

Most PDF files are vector PDF files and consist of a mix of text, lines, shapes, and images. When reducing the size of a PDF file images in the file and the fonts embedded to display the text both factor into the size of the file.

To reduce PDF file size we can adjust settings for image compression, downsampling, and font embedding.

Reduce PDF File Size Using Compression

When trying to reduce PDF file size, the first step is to see if the file contains images. High-resolution and large images can increase the file size considerably. Customizing the compression settings for the types of images in your document can have a huge effect.

For color images, use lossless compression methods such as LZW (Lempel-Ziv-Welch) and ZIP (Deflate). A lossless compression algorithm will compress the image with no loss of quality. LZW works best with images with large areas of solid color and repeating patterns. ZIP works well with gradients and varying colors.

JPEG is another popular compression method for color images. A lossy compression algorithm, JPEG creates smaller images by throwing away unneeded color in the image. This creates a smaller image with a trade-off in image quality and works best with photographs.

Black and white, or monochrome, images in a PDF file will use CCITT Group 4, CCITT Group 3, or Modified Huffman compression. These are all lossless compression algorithms, with CCITT Group 4 generally giving a higher compression rate.

Downsample Images to Reduce PDF File Size

Image downsampling reduces the number of pixels in the image, effectively reducing the image size and also reducing the PDF file size. Images are essentially a grid of pixels, or squares, that create the picture. Downsampling takes every few squares in the image and replaces it with a single larger square by combining the colors. This reduces the amount of information needed to store the image but also reduces the image clarity.

Font Embedding and How It Can Reduce PDF Size

Lastly, the fonts used in your document affect the PDF file size. Font embedding includes the font data for the text is the PDF so that it displays correctly on any device, even when the font is not installed on that system.

When using unique and unconventional fonts, embedding them becomes necessary. However, this can result in larger file sizes due to the size of the font file. A common technique to minimize this is to embed a subset of the font data that only includes the characters used in the text.

If your document uses common fonts that are available on most devices, you can save space by not embedding font information.

Adjusting Settings for PDF File Size in PDF and Raster Image Printer

In our image printers, a printer profile controls the type of file to create, the name of the file, and where to save it. A profile also includes advanced options for compression, downsampling, and font embedding. You can create multiple printer profiles for quick access to creating different flavors of PDF files.

To change compression, downsampling, and font embedding settings, you need to edit a personal profile you already have or create a brand new one.

Open the Dashboard using the desktop shortcut for the printer you have installed. We are using PDF Image Printer here, the steps are the same for Raster Image Printer.

Open the printer dashboard

A central hub, the Dashboard provides access to everything for your printer. From here you can see your license information, access help resources, and create and edit printers and printer profiles. Select the Edit & Create Profiles tile to open the Profile Manager editor.

Open profile editor to change settings to reduce PDF File size

Creating a Profile to Reduce PDF File Size

The Profile Manager is where you create new and edit existing profiles. Several system profiles are provided as part of the printer installation. Create a new, personal profile with the Add new profile button, or create a copy of an existing profile with the copy icon on the profile tile.

Use the profile manager to create a new profile or copy an existing one.

Create a new, personal profile with the Add new profile button, or create a copy of an existing profile with the copy icon on the profile tile. This will open the new profile in editing mode.

Once in edit mode, give your profile a memorable name and description in the edit fields at the top of the editor.

The left-hand side of the editor sorts the profile options into categories. Select a category to view its options on the right-hand side.

Changing PDF Compression

To change compression, select the Compression category on the left. Different options for Text and line art and Color, Indexed, Greyscale, and Monochrome images are available on the right, You can also disable PDF compression but this is normally not recommended.

The original compression settings for images in PDF files favor image quality over size but still provide a good balance between file size and image quality. You can, however, change the compression methods to reflect the type of images in your PDF file to see if you can reduce the PDF file size further.

ZIP works well with gradients of color and LZW with blocks of solid color. JPEG works best for photograph-style images but is often chosen for its high compression rate. For images that contain text, such as charts and graphs, higher levels of JPEG compression can cause a halo effect that will reduce the overall image quality.

Changing the compression used to store the images can reduce PDF size

Using Downsampling to Reduce Image Resolution

Downsampling reduces the number of pixels in an image, effectively lowering the image resolution, or dots per inch (DPI). Fewer pixels mean a smaller image and file size. This is a lossy process and it discards some of the image’s details to create a smaller file.

You have the option to set various downsampling levels for color, indexed, greyscale, and monochrome images. Downsampling occurs only when the DPI of the image is higher than the requested DPI.

Downsampling reduces the resolution of images in a PDF

To Embed, or Not to Embed?

To ensure a PDF displays consistently you need to embed the font data for all the fonts used in the PDF. When font information is missing a PDF viewer will attempt to substitute with the closest font available. As a result, this can cause unpredictable results when viewing the file.

As some fonts can be very large, this can contribute to file size. The default setting of Embed only the characters used (subset) of each font will include only the font information for the characters used in the document. This balances file size with the need to be able to always display the file correctly.

For smaller PDF file sizes, experiment with not embedding font information if using standard fonts like Arial, Times New Roman, and Verdana.

Embedded font data can be very large depending on the fonts and add to the pdf file size.

Saving and Using the Profile

When you are done editing the profile, the last step is to save it using the Save button in the upper left corner. To see how much your changes reduce the PDF file size, print your file using this new profile. If you keep the profile open in the editor, you can easily make adjustments and save them while testing to find the best options for your documents.

If you plan to use these settings regularly, you may wish to make this personal profile the default profile used by your printer.

It is important to realize that sometimes the images being placed in the PDF cannot be made any smaller. Adjusting compression options can result in larger file sizes. Downsampling can create a smaller file but has the side effect of a reduction in image quality.

Balance is key, and the size of your PDF file largely depends on the input documents you are converting. Reach out to our support team here at PEERNET and they will be happy to assist you.

pdf-raster-imagerprinter-feature-image

Reduce Non-Searchable PDF File Size

A non-searchable PDF file contains images of each page instead of pages where you can select and search text. With each page stored as an image, there is often a need to control settings to reduce non-searchable PDF file size.

Non-searchable PDF files are created when documents are scanned or converted from image formats such as TIFF, PNG, and JPEG. You can view and print the file, but not edit or search the text. When you need to keep the original layout and appearance of a document, a non-searchable PDF is a good choice. This type of PDF is also a good choice when you need to protect sensitive information by preventing the selection and copying of text in the document.

If you are creating searchable PDF files, head over to Reduce Searchable PDF File Size for information on reducing searchable PDF file size instead.

In this blog, we will look at color reduction, antialiasing, compression options, and printer resolution in PDF Image Printer 12 and Raster Image Printer 12. We will explore how to use them to find a good balance between file size and image quality.

Reduce Non-Searchable PDF File Size Using Color Reduction

The first way to reduce non-searchable PDF file size is with color reduction. Each page in the PDF is an image. Images can support various color modes and bit depths. Changing the color mode and bit depth of the images in your non-searchable PDF can help reduce its file size.

Images stored in a PDF can be color, indexed color, grayscale, or monochrome (black and white). Grayscale images use only shades of gray, resulting in a smaller file size, while monochrome uses the fewest count of colors, black and white. Consider converting to greyscale or monochrome when you do not have a lot of color in your document.

When you need to keep the color in the document, play with indexed color mode and modify the bit depth. Images can be created at different bit depths, such as 8-bit, 16-bit, or 24-bit. Lower bit-depths take less space. Indexed color mode reduces the number of colors in the image palette, further reducing file size. Try dropping your color to 8-bit or even 16-bit, which can significantly reduce the file size.

Reduce Non-Searchable PDF File Size Using Image Compression

The second way to create smaller PDF files is to compress the images that make up each page. There are many different compression methods that can significantly reduce the image size. The amount of color in your documents will also come into play when choosing a compression method.

Compression methods, like LZW (Lempel-Ziv-Welch) and ZIP (Deflate) are lossless compression methods for color images. A lossless compression method compresses the image data with no loss of quality. Images with large areas of solid color and repetitive patterns compress best with LZW. On the other hand, ZIP compression can be more effective with complex images of varying colors and gradients.

Another popular compression, JPEG, is a lossy compression method. It creates a smaller file size by throwing away unnecessary data. Its smaller size comes with a trade-off in image quality. JPEG works best on documents that are mostly photos and minimal text.

Turn Off Antialiasing to Reduce Non-Searchable PDF File Size

The next action that can help reduce your PDF image size is turning off antialiasing.

Digital images consist of a grid of squares called pixels. Coloring in the pixels on the grid creates the lines, shapes, and text in the image. Straight lines look good because they align perfectly on the grid. Diagonal and curved lines and text with round shapes, however, do not line up or are shown as partial pixels. This causes jagged edges, or the staircase effect.

Adding extra pixels to make these lines and shapes appear smoother and visually appealing is called antialiasing. However, this technique comes at a cost. Adding extra pixels takes up space in the image and results in larger files.

To help reduce non-searchable PDF file size you can turn off antialiasing. You will lose some of the visual quality in the image with the trade-off of a smaller file size.

Printer Resolution and Non-Searchable PDF File Size

PDF Image Printer 12 and Raster Image Printer 12 are printers that create images instead of printing on a piece of paper. A printer always prints at a selected resolution, or dots per inch (DPI). The higher the DPI the more detailed the output.

Images also use this concept of dots per inch when creating the grid of pixels that is the image. The printer resolution determines what DPI, or number of pixels, to use when creating the image. The higher the DPI, the more pixels we need to store and the larger the file. An image with a lower DPI uses fewer pixels per inch and can result in less detail and a pixelated or blocky image.

Again, it all comes down to balancing file size to the desired level of detail and clarity.

Adjusting Color Reduction, Compression, Antialiasing, and Printer Resolution

Our image printers use printer profiles that control what type of file to create. It also includes options such as the name of the file and where to save it, as well as color reduction, antialiasing, and compression options. Printer profiles allow easy access to creating multiple different types of files.

To change the color reduction, compression, antialiasing, and resolution options in a profile, you first need to create a new profile. Alternatively, you can edit an existing personal profile you already have.

Double-click the Dashboard desktop shortcut to open the dashboard.

open printer dashboard from desktop shortcut

The Dashboard is your access to everything for your printer. License information, printer profile creation and editing and other tools and resources are all here. Use the Edit & Create Profiles tile to open the Profile Manager tool.

open profile editor to create and edit profiles

Creating and Editing Profiles

You can edit and create new profiles for the printer using the Profile Manager. We provide system profiles created by PEERNET to start you off. You can create new, personal profiles using the Add new profile button, or copy and edit existing profiles using the copy icon on the profile tile.

create a new profile to reduce non-searchable PDF file size

When editing a new profile or a copy of a profile, first give them a name you will remember and a short description in the fields at the top of the editor.

Change Color Reduction to Reduce Non-Searchable PDF File Size

When editing profiles, all the profile categories are available on the left-hand side, with the options for the selected category on the right.

Color reduction and antialiasing settings are in the Save Options category. These settings apply when you are creating non-searchable PDF files. To ensure we are creating non-searchable PDF files, under Output Type and Color, toggle the option Create each page as an image to on.

Then, use the drop box next to the Reduce colors to label to select your desired color reduction.

reduce non-searchable pdf file size using color reduction options

The default setting of Optimal palette means the printer reduces the image for each page down to the smallest number of colors needed. The PDF file maintains all the colors from the original file. Some pages will be full color, some 16-bit or 8-bit color, and some monochrome. Reducing color on the pages that only have a few will help decrease your file size

To further drop your file size, you can limit the colors on each page by creating 8-bit or 4-bit color images only with the 256 color palette or 16 color palette options.

To create greyscale or monochrome images instead, select Greyscale or Black and white.

You can learn more about the other color reduction options in our online user guide.

Disable Antialiasing to Reduce Non-Searchable PDF File Size

Antialiasing adds extra pixels and information to create the effect of smoother lines, curves, and shapes. These extra pixels add more data to be stored, causing the file to be larger. You can turn this option off to see how much space you will save.

Antialiasing is also in the Save Options category. It is on by default. Click on the toggle to turn it off.

Turn off antialiasing to also reduce non-searchable PDF file size

Change Compression to Reduce Non-Searchable PDF File Size

Changing the compression method used to save the images in non-searchable PDF files is another way to reduce file size. The Compression category will display the different compression options for Text and line art, Color, Indexed, Greyscale, and Monochrome images. It also allows you to create uncompressed PDF files but this is normally not recommended.

The Color reduction setting does not affect text and line art compression.

The color reduction you have chosen above determines which of the compression options apply, but does not affect text and line art compression, only images.

If you are using Optimal palette, the compression options for Color, Indexed, Greyscale, and Monochrome images can all apply to different pages as each image is reduced to its minimal number of colors.

When color reduction is Greyscale, the compression method chosen for Greyscale images is used to compress the images. Similarly, Black and white for color reduction causes the Monochrome image compression method to be used.

reduce non-searchable PDF file size using compression

Lowering the Printer Resolution to Reduce Non-Searchable PDF File Size

Lowering the printer resolution works to reduce non-searchable PDF file size because each page is an image. A lower printer resolution means fewer dots per inch, or pixels, to store. Too low of a resolution, however, cause blocky and pixelated images with hard-to-read text.

The resolution the printer will use can be set from the Manage Printers tile on the dashboard.

Launch-Printer-Management-PDF

On the Print Management screen make sure your new profile is selected to be used when printing. Next, open the advanced settings using the down arrow in the row of icons to the right. It will change to an up arrow and additional settings, including Resolution, will appear.

Lower the resolution from the default of 300 dpi. When done, click the Save icon in the row of icons to commit your changes.

Saving and Using the Profile

Save your profile using the Save button in the upper left when your changes are complete. Print your document to the printer using this new profile to see how the changes will reduce your PDF image size.

Keep in mind that when reducing the file size of PDF images, always strike a balance between size and preserving document legibility and quality. Experiment with the different settings to see how they affect your file size and quality. And, as always, reach out to our support team with any questions, they’ll be happy to help you.

If you plan to use these settings regularly, you may wish to make this personal profile the default profile used by PDF Image Printer or Raster Image Printer.