Batch Converting Thousands of Files to TIFF with Document Conversion Service
A customer came to us recently with 50,000 files of various formats that they needed to convert to TIFF images. The files were all contained in a single folder and the client wanted a “fire-and-forget” solution.
The answer was to use Document Conversion Service and the included Watch Folder service to convert the entire folder of files, saving the new TIFF images in another folder with the same directory structure.
The Challenge of Converting Thousands of Files at Once
The initial configuration for the Watch Folder service is for use as a hot folder or drop folder where small groups of files are dropped periodically into the input folder to be processed. When files are detected, whether 1 or 100, or more, of them, they are all moved to a staging folder to be processed. When dealing with extremely large number of files this can cause large time delays as the files are moved, and other issues such as not having enough disk space to copy the files.
We can change the Watch Folder configuration to limit how many files are picked up at a time, and to wait until the first mini-batch of files is complete before picking up the next batch of files. This allows us to continuously work our way through the entire contents of the folder a bit at a time until it is complete.
We also need to turn off the use of date-time subfolders in both the Completed and the Failed folders. The default configuration is to save each set of completed files in a subfolder named using the date and time the files where picked up; in this case we don’t want this behavior as we would get a new folder for each mini-batch that gets picked up.
- Always have the Watch Folder service work from a copy of your input data, or keep a backup of the original data whenever possible. If this is not possible, leave Watch Folder configured to keep the input files in its Completed folder.
- When setting the location where you are saving the new files, make sure you will have enough hard disk space to store all of the output files.
Open the Watch Folder configuration file by going to Start – All Programs – PEERNET Document Conversion Service 2.0 – Watch Folder – Configure Watch Folder Settings.
In the Watch Folder section, make the following changes:
- Set the InputFolder to point to your copy of the folder of files to be converted and the OutputFolder to where you want to save the converted files. Remember, the OutputFolder is where all of the converted files are going so you need to make sure this location has enough space.
- Set the FailedFolder to where you want to store any files that fail to convert.
- If you need the Watch Folder service to keep a copy of your input data, set the CompletedFolder location. As this is where all of the files from the InputFolder will eventually be copied to as they are converted, make sure you have enough disk space to hold a complete copy of your input folder. If you do not need to keep your input files, set the CompletedFolder to an empty string.
- Set the StagingFolder and WorkingFolder locations – these folders hold the input files and output files temporarily while they are being converted.
- The settings Polling.MaxFilesToProcessAtATime and Polling.SynchronousFilePickup are used to control how many files are picked up at every polling interval, and if the first batch of files needs to complete before the next group is picked up.
- Set Polling.MaxFilesToProcessAtATime to the number of files you want the service to pick up each time.
- Set Polling.SynchronousFilePickup to true to have the service wait for the first mini-batch to complete before picking up new files.
- To turn off the use of the date-time subfolder for each batch of files, set UseTimeDateSubFoldersInCompletedFolder and UseTimeDateSubFoldersInFailedFolder to false.
- The last setting to change, and this is optional depending on your requirements, is to control how the output files are named. You may want to change the setting
<add Name ="Save;Remove filename extension" Value ="1"/>to make sure that the file extension from the original source file is not used to name the output file. This means that the output file from a file named Test.docx would become Test.tif. If this settings is not included, or is set to “0”, the output file name would be Test.docx.tif.
- Save the Watch Folder configuration file.
- Start the Document Conversion Service and then the Watch Folder Service.
- The Watch Folder service will now start processing files from your input folder. Only the number of files you set for Polling.MaxFilesToProcessAtATime will be picked up at a time and then processed, until all files in the folder have been converted.
<!-- Process Large Number of Files In Batches --> <WatchFolder Name="Batch Process Large Folder of Files"> <Settings> <!-- Folder options --> <add Name="InputFolder" Value="D:\ArchivedDocuments\Input"/> <add Name="SearchFilter" Value="*.*"/> <add Name="IncludeSubFolders" Value="True"/> <add Name="DeleteInputSubFolders" Value="True"/> <add Name="StagingFolder" Value="C:\PEERNET\WatchFolder\Staging"/> <add Name="WorkingFolder" Value="C:\PEERNET\WatchFolder\Working"/> <add Name="FailedFolder" Value="D:\ConvertedArchivedDocuments\Failed"/> <!-- Set Completed Folder if we want to keep the Input Files --> <add Name="CompletedFolder" Value="D:\ConvertedArchivedDocuments\Completed"/> <!-- Set Completed Folder to empty to not keep the Input Files --> <!-- <add Name="CompletedFolder" Value=""/> --> <add Name="OutputFolder" Value="C:\Company\Sales\Output"/> <add Name="PollingInterval" Value="15000"/> <add Name="DCOMComputerName" Value="localhost"/> <add Name="TestMode" Value="false"/> <add Name="NormalizeFilenames" Value="false"/> <!-- 0 means no limit> <add Name="Polling.MaxFilesToProcessAtATime" Value="10"/> <add Name="Polling.SynchronousFilesPickup" Value="true"/> <add Name="UseTimeDateSubFoldersInCompleteFolder" Value="false"/> <add Name="UseTimeDateSubFoldersInFailedFolder" Value="false"/> <!-- Output file options --> <add Name="Devmode settings;Resolution" Value="300"/> <add Name="Save;Output File Format" Value="TIFF Multipaged"/> <add Name="Save;Append" Value="0"/> <add Name="Save;Color reduction" Value="Optimal"/> <add Name="Save;Dithering method" Value="Halftone"/> <add Name="Save;Remove filename extension" Value="1" /> <add Name="TIFF File Format;BW compression" Value="Group4"/> <add Name="TIFF File Format;Color compression" Value="LZW RGB"/> <add Name="TIFF File Format;Indexed compression" Value="LZW"/> <add Name="TIFF File Format;Greyscale compression" Value="LZW"/> <add Name="JPEG File Format;Color compression" Value="Medium Quality"/> <add Name="JPEG File Format;Greyscale compression" Value="High Quality"/> <add Name="Image Options;Fill order" Value="MSB2LSB"/> <add Name="Image Options;Fax" Value="0"/> <add Name="Image Options;Fax Profile" Value="0"/> <add Name="Image Options;Fax Resolution" Value="4"/> <add Name="Processing;Rotate landscape" Value="0"/> </Settings> </WatchFolder>
Learn more about our document conversion service software for batching converting various file types to popular image formats like TIFF, JPG, and PDF. Or if you’re looking for slightly smaller scale conversions specifically to TIFF, try our PDF to TIFF Converter – the TIFF Image Printer.