Redacting a PDF document is not rocket science

It is true, that the PDF format has its specifics, that make redaction somewhat tricky, but understanding the basics and using the right tools makes it simple and efficient.

Why do redaction “accidents” happen?

For sure placing black bars over the text lines will not do the job. Simply deleting the bars, searching through the document, or copy pasting the passage will uncover the text underneath.

One should have in mind that PDF files often consist of multiple layers, for example, a page image (i.e. the scan of a document) and a text layer (placed under the page image by applying OCR). Obviously, both layers contain the text of the document.

Proper redaction solutions will make sure to remove the text from all layers of the document, including the page image layer, and REPLACE it by a black (or any other color) bar, not just cover it.

How to redact your documents reliably

In ABBYY FineReader you can find a “Redact” tool which removes selected text from all layers of the document.

This tool can be used either to highlight text passages that need to be redacted manually, while working through the document or by searching for a keyword, name, number etc., selecting some or all places where the searched keyword appears and applying redaction to all of them at once throughout the document.

Before redaction: the software highlights where the searched keyword is found

After redaction: there are no search results for the named keyword

Important: some precaution is still needed – in some cases, technology cannot solve all problems and some human oversight is recommended. Here are some examples:

-A PDF document includes a company logo, like the ABBYY logo, for example. In a “digital-born” PDF the company logo is an image, will therefore not be found by the search function, and will not be redacted. It is similar in scanned PDFs too: the company logo can be treated by the software as image even if it contains some text. In both cases, you can redact it manually by drawing an area around it using the “Redaction” or “Eraser” tools.

-When reviewing so-called “image-only” PDF documents (i.e. a document scan) – which means there is no digital text available for keyword search in the redaction process. For ABBYY FineReader this is not an issue – the software will detect automatically that the document does not have a text layer and will make it searchable while it is open. Make sure that the “Enable background recognition” option (which is turned on in the software by default) is always on when you redact documents:

-Your document includes a photo on which a name can be seen. For many reasons this name may not be found using search. You can use the “Redaction” or the “Eraser” tools to remove the complete photo or only the part containing the text.

-You can use the “Redaction” or the “Eraser” tools to conceal faces in pictures too.

Redaction should not stop with the obvious

Besides the text layer, added to scanned documents by OCR (Optical Character Recognition) to make them searchable, PDFs may include other information that is not immediately visible to the reader. Such information may be hidden in the document properties (metadata), in comments, in attached files, in bookmarks etc. For example, the author of the document, discussion between a client and the attorney, names mentioned in the document.
ABBYY FineReader can find keywords in the document properties (metadata) and comments and will separate them from the keywords found in the document text itself in the search results.
When you apply redaction to these areas FineReader replaces the redacted keyword by ***.

Comments and metadata before redaction

Comments and metadata redacted

If you want to make sure that “hidden objects” such as comments, metadata, attachments, bookmarks, etc. are removed from the document, you can use the “Delete Objects and Data” tool in FineReader:

Here you can select the objects that you would like to permanently remove from your document and apply.

Sanitizing PDF documents

Try ABBYY FineReader yourself – redact your document and then try to copy and paste the redacted text, search it, or even convert the whole document to Microsoft Word – you won’t be able to find a way to reveal what the redacted information was.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s