Real electronic discovery using Document Conversion Service

eDiscovery with Excel and Document Conversion Service

Document Conversion Service

NOTE: Also see our how-to guide on Converting Excel files for eDiscovery.

eDiscovery or electronic data discovery is the process of collecting and converting documents from their native formats to either PDF or TIFF formats to be used in digital forensic procedures to extract and analyze data.

Excel spreadsheets can pose certain challenges when trying to convert them to a different format and to have all required data in the spreadsheet both visible and readable in the final format.

Before converting your Excel documents, think about what data you will need to include and how you need to present that data. Some key things to consider are:

  • Showing all hidden data.
  • Making sure all columns and rows are wide or tall enough to display all cell contents.
  • Hiding auto date, time and file name fields in headers and footers.
  • Removing color formatting to show all text.
  • Modifying the page layout to best fit the page size used in the new format.

Using Document Conversion Service, the Excel document is actually printed when the conversion takes place. Excel files can be quite large and were often never intended to be printed, or were designed to be printed without showing any hidden data. Once hidden rows and columns are also displayed, how the spreadsheet is printed may also need to be adjusted to encompass the new rows and columns.

Hidden Data in Excel

As a part of data discovery, you may need to ensure that any hidden data in the Excel document is visible in the final TIFF or PDF.

Excel can hide data in many different ways. Most often, rows and columns can be hidden, or even entire spreadsheets within a workbook. Filtered data is another way that Excel can hide information, by hiding any rows that do not meet the filter criteria.

Each cell can have a comment, or a note attached to it, designated by a small triangle in the upper right corner. While not actually hidden in the user interface, cell comments are not always printed by default. Cell comments can be included as popup notes on the page but as these can cover other information on the spreadsheet. A better choice is to print all cell notes together at the end of the worksheet.

If Tracked Changes is enabled on the Excel document, this is another area where you may need to be able to show what changes were made to the spreadsheet. Excel provides the ability to display all changes made to any spreadsheet in the document as a new, separate spreadsheet.

Document Conversion Service includes individual options to control showing all or only some of the above data as needed. Hidden rows and columns can be made visible by using the autofit rows and columns settings. Filtered data can be disabled so that all rows are displayed. Printing options can be set to include comments at the end of the worksheet and track changes configured to include the changes as a new spreadsheet at the end of the workbook.

Hiding Auto Fields in Headers, Footers and Cell Formulas

Headers and footers are often set up to automatically include the current date, time or path to the spreadsheet. During data discovery this document is often converted on a different system and this information is no longer relevant. Through the Excel options available in Document Conversion Service, these fields can be replaced with a custom string.

Cell formulas can reference automatic dates and times using the functions NOW(), TODAY() and DATE(), which, like auto dates in headers and footers, may also not be applicable. Document Conversion Service can substitute a custom string for any cell that contains a formula using those functions.

Changing Cell Color and Formatting

Cells, rows and columns can be hidden in Excel, but another way of hiding data is by making the background and foreground colors the same. For example, a cell with white text on a white background still contains data but when printed will not be visible.

Another consideration is when you have cells where the text and background colors are not the same, but are both either very light or very dark, the cell may be hard to read in the final PDF or TIFF.

The simplest way to ensure that all colored cell contents are visible and readable is by setting Excel to print in black and white. This setting causes all cells to be black text on white or white text on black so that all cells are readable on the printed version.

If you need to keep some of the color formatting in the spreadsheet (such as borders and lines), but still need to see all of the text, you can also accomplish that; there are individual settings  for setting all text to black, removing all background colors and clearing any conditional formatting in the cell.

Adjusting the Page Layout

Often when you are printing an Excel document, it will need to be modified to display all of the data needed for electronic discovery. Options such as disabling filtered data and showing all hidden rows and columns will change the width and the number of rows and columns being displayed. If the spreadsheet was originally set to print on portrait 8.5×11” paper, it may look better printed on Legal in Landscape mode after you’ve enabled those changes.

What is the layout of the data? If there are more columns than rows, consider printing landscape instead of portrait. Do you want the data to fit on one page or to span multiple pages? If you are spanning multiple pages, do you want the data to go over and then down or down and then over when breaking across pages?

Any existing print areas and page breaks in the spreadsheet should be ignored. Print areas can be set so that only a select area of spreadsheet is printed, keeping potential required data from showing in the final TIFF or PDF. If hidden rows or columns are now being displayed, any existing print area may now display different data than what was intended.

Other common printing settings include adding gridlines to make it easier to follow the line of data in the output. As well, repeating column and row headings can be useful for documents that span across multiple pages.

Try it out Yourself!

We’ve touched on a lot of options that are available in Document Conversion Service for formatting Excel documents for electronic data discovery. For a more in-depth look and to test these options out for yourself, head on over to our how to guide Converting Excel Files for eDiscovery where you will find sample profiles and explanations to get you started right away. And of course, let us know if you have any further questions about this feature.