Processing Outlook and EML Email Messages and Attachments

Starting with Document Conversion Service 3.0.009, the Watch Folder Service includes the ability to extract and convert any attachments in Outlook Message files (*.msg) as well converting the Outlook message file itself.

In Document Conversion Service 3.0.029, support for Electronic Mail messages (*.eml) files with attachments was added. EML is the standard file extension for email files stored using the Internet Message Format protocol. This format complies with the RFC 5322 industry standard. They can be single messages, or include attachments.        

When this option is enabled, the file is checked for attachments and if any are found, the original message file and all of its attachments are converted. The initial settings have the resulting files placed into the OutputFolder under a sub folder of the same name as the original file. If any attachments are not of a file type supported by Document Conversion Service, the attachment will not be converted and is placed in the Failed folder.

The original message and all attached files and embedded images in the e-mail and signature are processed. This includes recursively processing attachments that are Outlook Messages or EML files that themselves have attachments. All message content and file attachments are extracted into a sub folder of the same name as the original file. If any name collisions are detected, the file names are made to be unique by adding a number in brackets at the end. As an example, an email message with an attached PDF document named lorem.pdf and an attached message that also has an attached PDF document of the same name will create two files - lorem.pdf.tif and lorem(2).pdf.tif.

The sample message below, Test Email With Attachments.msg, contains a single attached PDF file, as well as 5 small images from the signature. When the MSG is processed, the original message file and all attachments are processed. The attached PDF file will retain its name, and the inline images that are part of the signature will be named image001 through to image005.

When processing the above message, the option to keep the original filename's extension as part of the new filename was enabled. This can be disabled using the setting <add Name ="Save;Remove filename extension" Value ="1"/>. When this option is disabled, the output file from a file named lorem.pdf would become lorem.tif instead of lorem.pdf.tif.

Several settings have been added to control email message file attachment processing. Each of the included pre-configured WatchFolders already have these new settings added with the attachment processing disabled. To enable attachment processing, simply uncomment the setting PreprocessArchiveFormatsFilter. To disable it, you can comment it out again, or set it as an empty string.

MSG and EML Extraction Subfolder Options

PreprocessArchive.IncludeExtensionInFolderName

Allows you to control whether or not the .msg or .eml file extension is included in the name used to created the subfolder that will hold the message and attachments for processing. In the screenshot above, the .msg file extension was kept as part of the subfolder name. To minimize possible name collision, we recommend leaving this option enabled.

PreprocessArchive.CreateAllOutputInSubfolder

Added in version 3.0.025, this option controls if the MSG or EML file and extracted attachments will be stored in a subfolder, or at the root of the output folder. To minimize possible name collision, we recommend leaving this option enabled unless you are certain of unique filenames, or are using Unique File Naming and Flat Folder Structures and the MSG and EML Unique File Naming Options below.

MSG and EML Extraction Filtering

The next three settings are specific to handling, or filtering what message file attachments actually get converted. They apply to both MSG and EML files.

The first setting determines if inline attachments are converted, and the second two settings allow for further filtering of what e-mail attachments will be processed. These filtering options are applied in the order of inline attachments, include filter and then finally exclude filter.

Most often only one of the include or exclude filter will be used at a time, depending on how you need to filter. It is easier to say exclude only "*.jpg" attachments , or include only "*.pdf" attachments than to write long, specific lists of all of the file types.

PreprocessArchive.MSG.IncludeInlineAttachments

Message attachments can be inline (pasted into the email body) or attached as separate files. Images used in signatures are often inline attachments, while a PDF file attached to the email would not be. You can disable the processing of all inline attachments by setting this value to false. As some inline attachments can actually be documents, setting this to False is not recommended. This setting is always checked first before the message attachment filtering settings below.

PreprocessArchive.MSG.AttachmentsIncludeFilter

Allows for filtering of what attachments will be processed. When set to an empty string, all attachments are processed. To filter for specific file types, enter in the extensions for each type separated by the pipe (|) character. For example, to only convert any attached Word and PDF documents, you could set this as <add Name="PreprocessArchive.MSG.AttachmentsIncludeFilter" Value="*.doc|*.docx|*.pdf" />. This setting is always applied after the inline attachment check above and before the exclude filter check below.

PreprocessArchive.MSG.AttachmentsExcludeFilter

The last filtering setting, and also the setting applied last, is the exclude filter, which determines what files (by extension) to not extract from the MSG. As with the include filter above, enter in the extensions for each file type you do not want to be converted, separated by the pipe (|) character. When left as an empty string, no files are excluded.

MSG and EML Image Attachment Options

Image attachments are converted to the new format using the source image's resolution and not the requested output format resolution. This applies when converting to image as well as PDF files. Up-scaling images to a higher resolution can result in images that are many times larger than the actual source image. Other factors such as going from JPG to a lossless format like TIFF can also cause an increase in file size.

PreprocessArchive.MSG.ImageAttachmentsKeepSourceResolution

This defaults to true. Conversion options such as fax mode and other image option actions can override this. This setting overrides the ConverterPlugIn.PNImageConverter.KeepSourceImageResolution setting and applies only to images extracted from an MSG or EML. Setting this to false may cause images to be very large.

Controlling Image Size With Compression

Another way to control the size of extracted and converted images is by setting the compression option for the output format. As an example,converting a JPG image such as a photograph from a camera to a TIFF image using the default settings of LZW compression will create a very large file.

To create a comparable TIFF image, we need to change the compression to one of the JPEG compression options for TIFF.

Code Sample - Controlling Image Size with Compression

 

<WatchFolders>
 
      <!-- This watch folder creates 300 DPI Optimized TIFF Images -->
      <WatchFolder Name="ConvertToTIFF Watch Folder">
        <Settings>
          <!-- Folder options -->

          ...
          
          <!-- Use JPEG compression in TIFF images for smaller files from photographs -->
          <add Name="TIFF File Format;BW compression" Value="Group4"/>
          <add Name="TIFF File Format;Color compression" Value="High quality JPEG"/>
          <add Name="TIFF File Format;Indexed compression" Value="High quality JPEG"/>
          <add Name="TIFF File Format;Greyscale compression" Value="High quality JPEG"/>
        
        </Settings>
      </WatchFolder>
</WatchFolders>

 

MSG and EML Unique File Naming Options

Added in version 3.0.025, these options add the ability to apply unique names for all MSG and EML and attachments as they are processed when using the existing Unique File Naming and Flat Folder Structures settings. If either or both of OutputFolder.PrependUniqueGUIDToFilename or OutputFolder.AppendUniqueGUIDToFilename are true, a Globally Unique ID, or GUID is added to the output filename for the MSG and any extracted attachments.

The default behavior is to use the same GUID in both the MSG subfolder (if using) and in all extracted and converted attachments.

PreprocessArchive.MSG.UseUniqueGUIDInMSGFolderName

Controls if a GUID is used in the MSG or EML folder name created to store the converted MSG or EML and attachments. Applies when PreprocessArchive.CreateAllOutputInSubfolder is true.

PreprocessArchive.MSG.UseSameGUIDForAllFiles

The default behavior is to use the same GUID in the folder and for all files extracted and converted into that folder. To use a random, unique GUID for each file, set this to false.

PreprocessArchive.MSG.UseMessageIDForPrependGUID

PreprocessArchive.MSG.UseMessageIDForApppendGUID

When enabled, any Message ID included in the source email header information is used in place of a randomly generated GUID when building the output folder and email and attachment filenames. If enabled and there is no Message ID in the email, the GUID is used instead.

PreprocessArchive.MSG.UseMessageIDWithoutFQDN

An email Message Id consists of a string of characters  which is the unique identifier of this message from the mail server, and ends with ampersand (@) and the Fully Qualified Domain Name (FQDN) of the mail server that sent the message. By default, we set this option to true to only use the first part of the string.

Sample Watch Folder Setting for MSG and EML Extraction

The last set of highlighted settings shown below are not in the included in the pre-configured WatchFolder settings. These are some recommended settings to help control the size of the final output files when dealing with Outlook Messages with attached images and logos in the signatures.

Code Sample - Default Outlook Message Processing

 

<WatchFolders>
 
      <!-- This watch folder creates 300 DPI Optimized TIFF Images -->
      <WatchFolder Name="ConvertToTIFF Watch Folder">
        <Settings>
          <!-- Folder options -->

          ...
          
          <!-- Preprocess Archive Settings -->
          <!-- Comment out or set this as empty string to disable MSG/EML archive processing.-->

          <add Name="PreprocessArchiveFormatsFilter" Value=".msg|*.eml" />
          <!-- Setting this to false is not recommend as it increases -->

          <!-- the chance of archive and folder name collision. -->
          <add Name="PreprocessArchive.IncludeExtensionInFolderName" Value="true" />
          <!-- Setting this to false is not recommend as it increases the chance of archive --> 

          <!-- and folder name collision. Can be used in with OutputFolder.PrependUniqueGUIDToFilename or --> 

          <!-- OutputFolder.AppendUniqueGUIDToFilename to flatten the structure and create unique names. --> 

          <add Name="PreprocessArchive.CreateAllOutputInSubfolder" Value="true" />

 
          <!-- Preprocess MSG Archive Settings -->
          <add Name="PreprocessArchive.MSG.IncludeInlineAttachments" Value="true" />
          <!-- Pipe (|) separated list of file extensions (e.g *.doc|*.docx) to match on when -->

          <!-- processing message attachments. Pass empty string for match all. Runs after -->

          <!-- inline attachment check above, precedes exclusion check below.-->
          <add Name="PreprocessArchive.MSG.AttachmentsIncludeFilter" Value="" />
          <!-- Pipe (|) separated list of file extensions (e.g *.png|*.jpg) to exclude when -->

          <!-- processing message attachments. Pass empty string to exclude none. -->
          <add Name="PreprocessArchive.MSG.AttachmentsExcludeFilter" Value="" />

          <!-- When converting image attachments to images, keep the new image's resolution the -->

          <!-- same as source image. Fax mode and other image option actions can override this. -->

          <!-- This setting overrides ConverterPlugIn.PNImageConverter.KeepSourceImageResolution -->
          <add Name="PreprocessArchive.MSG.ImageAttachmentsKeepSourceResolution" Value="True" />

 

          <!-- The following settings are only used if OutputFolder.PrependUniqueGUIDToFilename -->

          <!-- or OutputFolder.AppendUniqueGUIDToFilename are set to true. -->

          <!-- When below is set to true, all files extracted from an MSG will have the same GUID. -->

          <add Name="PreprocessArchive.MSG.UseSameGUIDForAllFiles" Value="true" />

          <!-- When set to true, the folder used to store the msg and its attachments will -->

          <!-- be formatted with the pre-post GUID strings as set. -->

          <!-- If PreprocessArchive.MSG.UseSameGUIDForAllFiles is also true, the GUID -->

          <!-- in the folder name will match the files underneath. -->

          <add Name="PreprocessArchive.MSG.UseUniqueGUIDInMSGFolderName" Value="true" />

          <!-- When set to true, all files extracted from an MSG will use the ID, including -->

          <!-- the Fully Qualified Domain Name (FQDN) -->

          <add Name="PreprocessArchive.MSG.UseMessageIDForPrependGUID" Value="true" />

          <add Name="PreprocessArchive.MSGUseMessageIDForAppendGUID" Value="true" />

          <!-- Set this to true to only use the first part of the Message ID, -->

          <!-- dropping the @FQDN part. -->

          <add Name="PreprocessArchive.MSG.UseMessageIDWithoutFQDN" Value="true" />

 

          <!-- Keep image resolution the same as source. Applies to all images-->

          <add Name="ConverterPlugIn.PNImageConverter.KeepSourceImageResolution" Value="True"/>

         

          <!-- Output file options -->
          <add Name="Devmode settings;Resolution" Value="300"/>
 
          <add Name="Save;Output File Format" Value="TIFF Multipaged"/>
          <!-- Replace the above with this to create serialized images. -->
          <!-- <add Name="Save;Output File Format" Value="TIFF Serialized"/> -->
 
          <add Name="Save;Append" Value="0"/>
          <add Name="Save;Color reduction" Value="Optimal"/>
          <add Name="Save;Dithering method" Value="Halftone"/>
 
          <!-- This creates file.ext.tif, change to 1 to create file.tif-->
          <add Name="Save;Remove filename extension" Value="0" />
 
          <add Name="TIFF File Format;BW compression" Value="Group4"/>
          <add Name="TIFF File Format;Color compression" Value="High quality JPEG"/>
          <add Name="TIFF File Format;Indexed compression" Value="High quality JPEG"/>
          <add Name="TIFF File Format;Greyscale compression" Value="High quality JPEG"/>
        
        </Settings>
      </WatchFolder>
</WatchFolders>