high volume document conversion

High Volume Document Conversion with Clustering and Failover


clustering

If you are looking for a high volume document conversion solution with clustering support, we’ve got some great news with the latest version of our Document Conversion Service software. Meanwhile, if you’re not sure if you need this feature, read on for some info on what clustering support is, and why your organization might need it.

The amount of digital data that is being stored has grown exponentially over the last decade and continues to increase with every passing year. Many businesses are going paperless, storing their critical files in local archive systems or moving them to cloud storage solutions. These files can be Adobe PDF documents, email messages, images, Word documents, Excel spreadsheets or any of a myriad of other file types used to share information in our increasingly connected world.

Many companies archive all of this data to a common format to make it easily accessible for current and historical access. This can lead to thousands, or even tens of thousands of email messages, documents and other files to be converted each day. To convert this amount of files a document conversion solution that is highly available and allows for continuous uptime is required. This is where clustering support comes into play.

What is Clustering?

In the computing world, a cluster is a group of computers that work in tandem, doing the same task on a shared set of data. This spreads the work out across each computer, or node, in the cluster and allows for load balancing and failover protection.

Configuring higher-powered computers in the cluster to perform more of the conversions spreads the work out, keeping each computer busy, but not overloaded.

With multiple computers running, the need to take one computer offline for maintenance will have minimal impact, as the other nodes will continue to convert files from the central location.

We are happy to announce that the newest version of Document Conversion Service now includes clustering support with its included Watch Folder service.

Overview of a Cluster

Do I Need Clustering?

While advantageous, clustering is not the correct approach for everyone. It is best suited for dealing with situations with large numbers of files to be converted. If you have a small daily volume of files, it may not be necessary.

It is a good choice when you have any of the following scenarios:

  • You need to be able to convert a certain number of files every day.
  • You have to convert an existing, very large, collection of data by a certain timeframe.
  • You need 100% uptime on your conversion ability.
  • Your desired user experience requires that you can create the converted file in a certain amount of time.

These are just a few examples of some common conversion requirements that will benefit from using a clustered approach to document conversion.

Many Hands Make Light Work

When dealing with large volumes of files to be converted, clustering is an easy, cost effective way to increase the number of documents you can convert in a given timeframe.

A classic case of many hands making light work, it is actually faster to have two or more clustered computers processing against the same document collection than to convert that same collection using a single, more powerful, computer running full throttle.

While it seems like the performance might be a one-to-one match, running a powerful computer at its maximum actually slows down the processing. Not only is the computer busy converting documents, it is also running the operating system and any other background tasks. When pushed to its maximum processing capacity, the time spent switching between all of these tasks and making sure each gets an equal time slice slows the performance of the machine.

Running two less powerful computers below capacity gives each computer the breathing room to perform the necessary background tasks without affecting the speed of the conversions performed on each computer.

Always Have a Backup

As much as we wish otherwise, there will always be times when a computer will need taken offline for maintenance. This could be scheduled maintenance or system upgrades, or a restart needed due to unexpected behaviour from software on the server.

In a non-clustered environment, having to perform maintenance or reboot the computer causes backups and delays in the file conversion performed by that computer. In a clustered environment, performing maintenance on one computer in the cluster does not affect the others. They will continue to convert files while the other computer is fixed.

Failover Protection in a Clustered Environment

The nature of clustered conversion is such that it also provides an inherent level of high availability to your system. For a system that must have 100% uptime this approach allows for the loss of one or more computers in the cluster while still providing an acceptable level of service, albeit at a possibly slower rate.

Balancing the Load

Clustering allows you to distribute the document conversion out across multiple computers to provide better overall performance. To make the most out of each computer in your cluster, it is also important to be able to load-balance the amount of documents converted by each computer.

Performing document conversion can be a CPU intensive task. The ability to fine-tune each computer in the cluster to work at its peak performance so that it is busy, but not overloaded, is critical to a stable system.

Each computer’s optimum performance depends on the amount of RAM, the CPU and the numbers of cores it has. Also part of the equation are other services and programs also running on that computer as they, too, take a slice of the performance pie, reducing what remains for conversion.

The true power of a clustered approach for document conversion becomes apparent when using a collection of computers of varying levels of processing power. Having the ability to tailor the work performed by each computer to match its individual specifications allows you to use what you have to get the work done and provides you with a flexible and scalable conversion system.

Load Balancing a Clustered Environment

Clustering and Failover with PEERNET Document Conversion Service

PEERNET’s popular Document Conversion Service now includes clustering and failover support built in to the included Watch Folder tool. Designed from the start with large volume batch conversion in mind, DCS supports converting files from PDF, Office documents and other common formats to PDF, TIFF, JPEG and other image storage types.

Commonly used in the medical, insurance and legal industries, Document Conversion Service and the Watch Folder tool run in the background 24/7 and require minimal active monitoring once setup is complete.

The new clustering and failover ability allows you to direct multiple computers running DCS at a single shared location containing the files to be converted. This can be an existing mass collection of files, or a drop location that periodically receives collections of files to be converted. Each computer will independently convert files from the shared location, increasing conversion speed and allowing for high-availability and failover support.

If you would like to learn more about Document Conversion Service and clustering, or any of its other features, and how we can help you with your conversion needs, we’re happy to help out. Simply contact us with your organization and your requirements and we’ll get back to you shortly.

If you prefer a more hands-on approach, you can request a trial of Document Conversion Service to install and investigate at your leisure. The following articles below will help you get started, and our technical team is always available to answer any questions you may have.