Converting Excel Files for eDiscovery

Document Conversion Service provides many settings for formatting Excel documents for electronic data discovery.

When converting Excel documents, there is often a need to make formatting changes to ensure that all of the needed data is visible and readable in the converted file. The following sections outline what formatting changes you may need to consider, and the profile settings to use in Document Conversion Service to apply those changes.

Hidden Data

Excel can hide data in many ways and often this data needs to be visible in the converted file. Hidden rows, columns, and spreadsheets are encountered most often. Filtered data, comments (or notes) on individual cells and tracking changes are other ways there can be data embedded in the spreadsheet. Cell comments can be printed in-place on the spreadsheet as pop-up notes or all together at the end of the spreadsheet. As in-place pop-ups can obscure data on the spreadsheet, it is best to print the comments at the end for data discovery. When enabled, any tracked changes on the spreadsheet are always listed on a separate worksheet. Cells that have been modified can optionally be highlighted on the spreadsheet as well.

<!—Show hidden columns and rows -->
<add Name="Microsoft.Excel.AutoFitRows" Value="True" />
<add Name="Microsoft.Excel.AutoFitColumns" Value="True" />

<!—Show filtered data -->
<add Name="Microsoft.Excel.Worksheet.ShowAllData" Value="True" />

<!-- Print hidden worksheets -->
<add Name="Microsoft.Excel.PrintHiddenWorksheets" Value="True" />

<!-- List any tracked changes on a separate sheet -->
<add Name="Microsoft.Excel.TrackChanges.HighlightChangesOnScreen" Value="True" />
<add Name="Microsoft.Excel.TrackChanges.ListChangesOnNewSheet" Value="True" />
<add Name="Microsoft.Excel.TrackChanges.ExcelTrackChangesWhoParamter" Value="Everyone" />

<!-- Print comments at the end -->
<add Name="Microsoft.Excel.PageSetup.PrintComments" Value="PrintSheetEnd"/>

Auto Date, Time and Filename Fields

Spreadsheet headers and footers often configured to automatically fill in the current date, time or path to the current spreadsheet. For data discovery, where conversion can take place on another machine, this information often needs to be replaced with a custom string to indicate that this data was an auto date, time or filename field.

<!-- Replace date fields with <AUTODATE> string -->
<add Name="Microsoft.Excel.ReplaceFieldDateWith" Value="<AUTODATE>"/>

<!-- Replace time fields with <AUTOTIME> string -->
<add Name="Microsoft.Excel.ReplaceFieldTimeWith" Value="<AUTOTIME>"/>

<!-- Replace filename fields with <AUTOFILENAME> string -->
<add Name="Microsoft.Excel.ReplaceFieldFileNameWith" Value="<AUTOFILENAME>"/>

Cell formulas can also contain functions like NOW(), TODAY() and DATE() where the current date does not apply for data discovery. Document Conversion Service can replace any formula that uses these functions with a custom string.

<!-- Replace cell formulas containing TODAY(), NOW() with <AUTOFIELD> string -->
<add Name="Microsoft.Excel.ReplaceFormulasWithAutoDateAndTimeAsString" Value="<AUTOFIELD>" />

Cell Color and Formatting

Any use of color for cell text and background also has an effect on the converted document. Text and formulas are often hidden by setting both the text and cell background to the same color. Any cells where the text and background colors are both very light or both very dark may be hard to read in the final converted document. The easiest way to manage color in an Excel spreadsheet and ensure that all colored cell content are readable is to set Excel to print in black and white.

<add Name="Microsoft.Excel.PageSetup.BlackAndWhite" Value="true"/>

To keep some of the color information, such as borders, images and logos, individual settings for text color, background and colors set by conditional formatting can be configured.

<!-- Change text and clear background -->
<add Name="Microsoft.Excel.PageSetup.BlackAndWhite" Value="false"/>
<add Name="Microsoft.Excel.SetAllTextAsBlack" Value="True" />

<add Name="Microsoft.Excel.RemoveBackgroundColors" Value="True" />
<add Name="Microsoft.Excel.ClearAllConditionalFormatting" Value="True" />

Changing the Page Layout

Many Excel documents are very large and were not designed to be printed. If they were, setting any options such as showing hidden rows and columns will have changed the layout of the spreadsheet so that the existing print area and page setup no longer apply. Keep in mind that any existing print area may have also been set so that not all of the data on the spreadsheet is printed; for data discovery the entire spreadsheet contents are often needed.

Print Areas and Page Breaks

Existing print areas and page breaks can be ignored but when doing this Excel will print any cell that has been formatted. If formatting has been applied to a column or row, this can lead to many pages of empty cells being printed. Document Conversion Service includes options to clear the formatting of these cells so that only cells that contain data get printed.

<!-- Ignore print areas -->
<add Name="Microsoft.Excel.Worksheet.PrintOut.IgnorePrintAreas" Value="True" />
<add Name="Microsoft.Excel.Worksheet.ResetAllPageBreaks" Value="True" />

<!-- Clear formatting on empty cells -->
<add Name="Microsoft.Excel.ClearFormatsOnEmptyRowsOnTop" Value="True" />
<add Name="Microsoft.Excel.ClearFormatsOnEmptyRowsOnBottom" Value="True" />
<add Name="Microsoft.Excel.ClearFormatsOnEmptyColumnsOnRight" Value="True" />
<add Name="Microsoft.Excel.ClearFormatsOnEmptyColumnsOnLeft" Value="True" />

Page Margins

Page margins can be adjusted to give wider or narrower margins, and page content can be centered horizontally, vertically, or both. Margins are set in points, with 1 inch being equal to 72 points.

<!—Margins set as points 1 inch = 72 points -->
<!-- Print spreadsheet with top/bottom margins 1 inch -->
<add Name="Microsoft.Excel.PageSetup.TopMargin" Value="72" />
<add Name="Microsoft.Excel.PageSetup.BottomMargin" Value="72"/>

<!-- Print spreadsheet with left/right margins 1/2 inch -->
<add Name="Microsoft.Excel.PageSetup.LeftMargin" Value="36" />
<add Name="Microsoft.Excel.PageSetup.TopMargin" Value="36" />

<!-- Print spreadsheet centered -->
<add Name="Microsoft.Excel.PageSetup.CenterVertically" Value="True" />
<add Name="Microsoft.Excel.PageSetup.CenterHorizontally" Value="True" />

Headings, Gridlines and Headers and Footers

Other printing options such as row and column headings and gridlines can be added. In addition to replacing the auto data, time and filename fields in the headers and footers, you can also clear or replace any headers or footers as needed as well as adjust the height of the headers. Like margins, the height of the headers and footers is specified in points measured from the top or the bottom of the page.

<!-- Print row, column headings and gridlines -->
<add Name="Microsoft.Excel.PageSetup.PrintHeadings" Value="True" />
<add Name="Microsoft.Excel.PageSetup.PrintGridlines" Value="True"/>

<!-- Put the sheet name left aligned in the header, &A is syntax for sheet name --> 
<add Name="Microsoft.Excel.PageSetup.LeftHeader" Value="Sheet Name: &A"/>

<!-- Clear any other existing header and footer settings --> 
<add Name="Microsoft.Excel.PageSetup.CenterHeader" Value=""/>
<add Name="Microsoft.Excel.PageSetup.RightHeader" Value=""/>

<add Name="Microsoft.Excel.PageSetup.LeftFooter" Value=""/>
<add Name="Microsoft.Excel.PageSetup.CenterFooter" Value=""/>
<add Name="Microsoft.Excel.PageSetup.RightFooter" Value=""/>

<!-- Spreadsheet header/footer margins 1/2 inch -->
<add Name="Microsoft.Excel.PageSetup.HeaderMargin" Value="36"/>
<add Name="Microsoft.Excel.PageSetup.FooterMargin" Value="36"/>

Page Layout

The last step is to determine how to display the data on the page. Do you want to fit all of the data on a single page or span multiple pages? If you are spanning multiple pages, do you want the data to go over then down or down and then over? These settings would print the Excel spreadsheet in landscape orientation zoomed into 70% with page order of over then down.

<!-- Print landscape and zoomed in 70% -->
<add Name="Microsoft.Excel.PageSetup.Orientaton" Value="Landscape” />
<add Name="Microsoft.Excel.PageSetup.Order" Value="OverThenDown"/>

<add Name="Microsoft.Excel.PageSetup.Zoom" Value="70"/>

These settings would print the Excel spreadsheet in landscape orientation with all content fit to a single page.

<!-- Print landscape and fit columns to a single page -->
<add Name="Microsoft.Excel.PageSetup.Orientaton" Value="Landscape” />
<add Name ="Microsoft.Excel.PageSetup.Order" Value="DownThenOver"/>

<add Name="Microsoft.Excel.PageSetup.Zoom" Value="False"/>
<add Name="Microsoft.Excel.PageSetup.FitToPagesWide" Value="1"/>
<add Name="Microsoft.Excel.PageSetup.FitToPagesTall" Value="1"/>

Printing Sheets and Charts

If there are charts in the spreadsheet they can be ignored, printed before or after the spreadsheet, or be the only thing printed.

<!-- Print any charts after the spreadsheets -->
<add Name="Microsoft.Excel.PrintOut" Value="PrintOutWorkbookThenCharts” />

Sample Profiles

Some sample profiles for converting Excel, PowerPoint and Word documents for eDiscovery can be downloaded below. We have only covered the settings for Excel above, but similar settings, where applicable, exist for Word documents and PowerPoint presentations as well, and are included in the sample profiles. The profiles are zipped and will need to be extracted and placed in the Document Conversion Service Profiles folder (C:\ProgramData\PEERNET\Document Conversion Service\Profiles) before using.

Don’t see what you need? Have a question? Contact us and we’ll be glad to help.

Watch Document Conversion Service Tutorials