Large Volume Batch Conversion Using Clustering

Introduced in Document Conversion Service 3.0.010, clustering can be used on a single computer to efficiently handle converting folders containing a very large number of files.

The Watch Folder Service was initially intended for use with hot folders or drop folders where small numbers of files to be converted are dropped periodically into a folder.  When files are detected in the input folder, the Watch Folder Service moves the entire contents of the folder to its staging location for processing. With a folder containing a large volume of files this causes long time delays as the files are copied as well as potential issues such as not having enough disk space to copy the files.

When clustering mode is enabled, a small number of files in the input folder are locked and then copied to the staging folder for processing. The original file stays in the input folder until it is processed, then unlocked and copied to the Completed folder. This reduces disk space usage and delays in copying large numbers of files. It also allows for faster turnaround to move on to the next file.

Enabling Clustering

Clustering can be enabled on any Watch Folder definition using the ClusteredProcessing.Enabled setting. This setting is false by default.

Keeping the Conversion Service Busy

The goal here is to take advantage of the conversion threads available in your licensed copy of Document Conversion Service and to keep those threads busy converting documents. The number of conversion threads you have available depends on your license model.

To maximize throughput, we need to match the number of files that Watch Folder will process in parallel to the number of conversion threads available in Document Conversion Service, and increase it by a small amount. This will allow us to keep the conversion threads continuously converting as there will always be a document or two queued up waiting as long as there are files in our folder left to be processed.

The number of files picked up is initially set to NumberOfDocumentsInParallel from the Watch Folder Service configuration file. You can override this setting using the ClusteredProcessing.MaxFilesToPickup setting.

 

Code Sample

 

<WatchFolders>
 

  <!-- This watch folder creates 300 DPI Optimized TIFF Images -->

  <WatchFolder Name="ConvertToTIFF" >
    <Settings>

      <!-- Folder options -->

      <add Name="Enabled" Value="True"/>
      <add Name="InputFolder" 

            Value=""C:\PEERNET\WatchFolders\ConvertToTIFF\Input"/>
      <add Name="SearchFilter" Value="*.*"/>
      <add Name="IncludeSubFolders" Value="True"/>

      <add Name="DeleteInputSubFolders" Value="True"/>

      <add Name="StagingFolder" 

            Value="C:\PEERNET\WatchFolders\ConvertToTIFF\Staging"/>

       <add Name="WorkingFolder" 

            Value="C:\PEERNET\WatchFolders\ConvertToTIFF\Working"/>

      <add Name="FailedFolder" 

            Value="C:\PEERNET\WatchFolders\ConvertToTIFF\Failed"/>

      <add Name="CompletedFolder"

            Value="C:\PEERNET\WatchFolders\ConvertToTIFF\Completed"/>
      <add Name="OutputFolder" 

            Value=":\PEERNET\WatchFolders\ConvertToTIFF\Output"/>
      <add Name="PollingInterval" Value="15000"/>

      <add Name="DCOMComputerName" Value=""/>
      <add Name="TestMode" Value="false" />

 

       <!-- 0 means no limit. -->

       <add Name="Polling.MaxFilesToProcessAtATime" Value="0" />

       <add Name="Polling.SynchronousFilePickup" Value="false" />

       ....

       

       <!-- Clustered Processing -->

       <!-- This forces batch mode pickup with optional synchronous wait and -->

       <!-- no date time stamp used in the Failed\Completed folders. Files are -->

       <!-- locked so other servers can process files from the same folder. -->

       <add Name="ClusteredProcessing.Enabled" Value="true" /> 

       <!-- Override this for clustering to customize the number of files picked up. -->

       <add Name=""ClusteredProcessing.MaxFilesToPickup" Value="6" />

     

      <add Name="Devmode settings;Resolution" Value="300"/>

      <add Name="Save;Output File Format" Value="TIFF Multipaged" />

       <!-- <add Name="Save;Output File Format" Value="TIFF Serialized" /> -->

     

       <add Name="Save;Append" Value="0"/>

       <add Name="Save;Color reduction" Value="Optimal"/>

       <add Name="Save;Dithering method" Value="Halftone"/>

 

       <!-- This creates file.ext.tif, change to 1 to create file.tif -->  

       <add Name="Save;Remove filename extension" Value="1"/>

 

      <add Name="TIFF File Format;BW compression" Value="Group4"/>

      <add Name="TIFF File Format;Color compression" Value="LZW RGB0"/>

      <add Name="TIFF File Format;Indexed compression" Value="LZW"/>

      <add Name="TIFF File Format;Greyscale compression" Value="LZW"/>

      <add Name="JPEG File Format;Color compression" Value="Medium Quality"/>

      <add Name="JPEG File Format;Greyscale compression" Value="High Quality"/>

 

    </Settings>
  </WatchFolder>

 
</WatchFolders>