Sanitization (classified information)

Sanitization (classified information)

Sanitization is the process of removing sensitive information from a document or other medium, so that it may be distributed to a broader audience. When dealing with classified information, sanitization attempts to reduce the document's classification level, possibly yielding an unclassified document. Originally, the term sanitization was applied to printed documents; it has since been extended to apply to computer media and the problem of data remanence as well.

Redaction generally refers to the editing or blacking out of text in a document, or to the result of such an effort. It is intended to allow the selective disclosure of information in a document while keeping other parts of the document secret. Typically the result is a document that is suitable for publication, or for dissemination to others than the intended audience of the original document. For example, when a document is subpoenaed in a court case, information not specifically relevant to the case at hand is often redacted.


Government secrecy

In the context of government documents, redaction (also called sanitization) generally refers more specifically to the process of removing sensitive or classified information from a document prior to its publication, during declassification.

Secure document redaction techniques

A US government document that has been redacted prior to release.
A heavily redacted page from a lawsuit filed by the ACLU — American Civil Liberties Union v. Ashcroft

The traditional technique of redacting confidential material from a paper document before its public release involves crossing out portions of text with a wide black pen, followed by photocopying the result. This is a relatively easy to understand process and has only minor risks associated with it. For example, if the black pen is not wide enough, careful examination of the resulting photocopy may still reveal partial information about the text, such as the difference between short and tall letters. The exact length of the removed text also remains recognizable, which may help to guess plausible wordings for shorter redacted sections. Where computer-generated proportional fonts were used, even more information can leak out of the redacted section in the form of the exact position of nearby visible characters.

The National Archives (UK) published a document, Redaction Toolkit, Guidelines for the Editing of Exempt Information from Documents Prior to Release (2004), "to provide guidance on the editing of exempt material from information held by public bodies."

Secure redacting is a far more complicated problem with word processing file formats. These may also save a revision history of the edited text that still contains the redacted text. In some file formats, unused portions of memory are saved that may still contain fragments of previous versions of the text. Where text is redacted by overlaying graphical elements (usually black rectangles) on top of text, the original text remains in the file and can be uncovered by simply deleting the overlaying graphics. Effective redaction of electronic documents requires the actual removal of the text or image data from the document file. This either requires a very detailed understanding of the internal operation of the document processing software and file formats used, which most computer users lack, or software tools designed for sanitizing electronic documents (see external links below).

Redaction usually requires a marking of the redacted area with the reason that the content is being restricted. Government documents being released under the Freedom of Information Act are marked with exemption codes that denote the reason why the content has been sanitized.

The National Security Agency published a document, Redacting with Confidence: How to Safely Publish Sanitized Reports Converted from Word to PDF, which provides instructions for redacting Microsoft Word generated PDF files.[citation needed]

Printed matter

A page of a classified document that has been sanitized for public release. This is page 13 of a U.S. National Security Agency report [1] on the USS Liberty incident, which was declassifed and released to the public in July 2003. Classified information has been blocked out so that only the unclassified information is visible. Notations with leader lines at top and bottom cite statutory authority for not declassifying certain sections. Click on the image to enlarge.

A printed document which contains classified or sensitive information will frequently contain a great deal of information which is less sensitive. There may be a need to release the less sensitive portions to uncleared personnel. The printed document will thus be sanitized to obscure or remove the sensitive information. The term redaction is also used to describe this process, though that term is more often used in literary contexts.

In some cases, sanitizing a classified document removes enough information to reduce the classification from a higher level to a lower one. For example, raw intelligence reports may contain highly classified information, like the identities of spies, that is removed before the reports are distributed outside the intelligence agency: the initial report may be classified as Top Secret while the sanitized report may be classified as Secret.

In other cases, like the U.S. National Security Agency's report on the USS Liberty incident (right), the report may be sanitized to remove all sensitive data, so that the report may be released to the general public.

As is seen in the USS Liberty report, paper documents are generally sanitized by covering the classified and sensitive portions and then photocopying the document, resulting in a sanitized document suitable for distribution.

Computer media and files

Computer (electronic or digital) documents are more difficult to sanitize. In many cases, when information in an information system is modified or erased, some or all of the data remains in storage. This may be an accident of design, where the underlying storage mechanism (disk, RAM, etc.) still allows information to be read, despite its nominal erasure. The general term for this problem is data remanence. In some contexts (notably the US NSA, DoD, and related organizations), sanitization typically refers to countering the data remanence problem; redaction is used in the sense of this article.

However, the retention may be a deliberate feature, in the form of an undo buffer, revision history, "trash can", backups, or the like. For example, word processing programs like Microsoft Word will sometimes be used to edit out the sensitive information. Unfortunately, these products do not always show the user all of the information stored in a file, so it is possible that a file may still contain sensitive information. In other cases, inexperienced users will use ineffective methods which fail to sanitize the document. Metadata removal tools are designed to effectively sanitize documents by removing potentially sensitive hidden information.

In May, 2005, the US military published a report on the death of Nicola Calipari, an Italian secret agent, at a US military checkpoint in Iraq. The report was published in PDF format and had been incorrectly redacted using commercial word processing tools. Shortly thereafter, readers discovered that the blocked-out portions could be retrieved using simple cut and paste operations on the posted document.[1]

Similarly, on May 24, 2006, lawyers for the communications service provider AT&T filed a legal brief[2] regarding their cooperation with domestic wiretapping by the NSA. Text on pages 12 through 14 of the PDF document were incorrectly redacted, and the covered text could be retrieved using cut and paste.[3]

At the end of 2005, the NSA released a report giving recommendations on how to safely sanitize a Word document.[4]

Issues such as these make it difficult to reliably implement multilevel security systems, in which computer users of differing security clearances may share documents. The Challenge of Multilevel Security gives an example of a sanitization failure caused by unexpected behavior in Microsoft Word's change tracking feature.[5]

The two most common mistakes for incorrectly redacting a document are adding an image layer over the sensitive text without removing the underlying text, and setting the background color to match the text color. In both of these cases, the redacted material still exists in the document underneath the visible appearance and is subject to searching and even simple copy and paste extraction. Proper redaction tools and procedures must be used to permanently remove the sensitive information. This is often accomplished in a multi-user workflow where one group of people mark sections of the document as proposals to be redacted, another group verifies the redaction proposals are correct, and a final group operates the redaction tool to permanently remove the proposed items.


  1. ^ BBC Report (May 2, 2005). "Readers 'declassify' US document". BBC. 
  2. ^
  3. ^ Declan McCullagh (May 26, 2006). "AT&T leaks sensitive info in NSA suit". CNet News. 
  4. ^ NSA SNAC (December 13, 2005) (PDF). Redacting with Confidence: How to Safely Publish Sanitized Reports Converted From Word to PDF. Report# I333-015R-2005. Information Assurance Directorate, National Security Agency, via Federation of American Scientists. Retrieved 2006-05-29. 
  5. ^ Rick Smith (2003). "The Challenge of Multilevel Security" (PDF). Black Hat Federal Conference. 

See also

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Sanitization — can refer to* Disinfection, the destruction of pathogenic and other kinds of microorganisms * Sanitization (classified information) * Censorship, removing information from a published document …   Wikipedia

  • Data remanence — is the residual representation of data that remains even after attempts have been made to remove or erase the data. This residue may result from data being left intact by a nominal file deletion operation, by reformatting of storage media that… …   Wikipedia

  • Data erasure — (also called data clearing or data wiping) is a software based method of overwriting data that completely destroys all electronic data residing on a hard disk drive or other digital media. Permanent data erasure goes beyond basic file deletion… …   Wikipedia

  • Bleep censor — Part of a series on Censorship By media …   Wikipedia

  • Data cleansing — Not to be confused with Sanitization (classified information). Data cleansing, data cleaning, or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. Used… …   Wikipedia

  • Redact — may refer to: * Redacted (film), a 2007 film * Redaction, a form of editing in which multiple sources are combined and subjected to minor alteration to create a definitive and coherent work * Sanitization (classified information), the process of… …   Wikipedia

  • Data scrubbing — Not to be confused with Data cleansing or Sanitization (classified information). Data scrubbing is an error correction technique which uses a background task that periodically inspects memory for errors, and then corrects the error using ECC …   Wikipedia

  • National Industrial Security Program — The National Industrial Security Program, or NISP, is the nominal authority (in the United States) for managing the needs of private industry to access classified information. The NISP was established in 1993 by Executive Order 12829.[1] The… …   Wikipedia

  • Multilevel security — or Multiple Levels of Security (abbreviated as MLS) is the application of a computer system to process information with different sensitivities (i.e., at different security levels), permit simultaneous access by users with different security… …   Wikipedia

  • Censorship — Part of a series on Censorship By media …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.