Friday, October 3, 2008

Verifying Encryption in Office documents

I recently had a question come to me from someone that has to send sensitive data over email. She is keeping the data in an Office 2007 excel file, but in some cases the data are kept in an Office 2003 file. She was asking me about encrypting the documents.

I would really like for her to have an easy to use option that doesn't require learning new software, so I immediately thought about the password protection options that come with Office. I know that in Office 2007 you can select some great encryption algorithms, and when you encrypt a document it is really encrypted. I was under the impression that password protection of files in Office 2003 or better was just that, a password but no encryption of the contents. But when I started poking around in the interface, I did see options to encrypt the document. I decided that before I told this woman that she could use this technique, I needed to verify it for myself.

So when you're performing a scientific experiment, you need to have a hypothesis that you're going to test. In this case, I have two:
H1: Password protecting a document in Office 2003 and selecting an algorithm from the advanced options will obfuscate the data in the plaintext file.
H2: The resulting file will be well encrypted.
So how can you verify that the document has been encrypted? Well here is what I did. I created a simple document in word 2003 and saved it. Then I opened the document, applied a password and one of the encryption options and saved it as another document. Now I have two documents, test1.doc and test2.doc. The first thing I did was run the strings command against them.

As you can see from the two photos, the plaintext data are visible in test1, but not in test2. So now we know that the document has at a minimum been obfuscated. This essentially proves H1. There is also an easy way that we can test the strength of the encryption. Well encrypted data does not compress well, so we can compress the two documents and compare the reduction in size. Unfortunately, as you can see from the output here, the results of this test are not supporting H2. Now I need a way to verify my position that well encrypted data does not compress well. I need to add a control to this experiment.


For my control, I am going to use GPG to encrypt a file since GPG makes use of a well respected encryption algorithm. I ran GPG against test1.doc to get test1.doc.gpg. Just as with the Microsoft enryption, the file size grew. Then I attempted to compress the file and got zero compression. In fact, the compressed file actually grew in size.

So my final conclusion is that password protecting a document in Office 2003 and selecting an encryption algorithm (other than XOR) from the advanced options will obfuscate the data in the file. The quality of the encryption algorithm cannot be verified, but is known to be less than that of GPG.

Now here is my disclaimer. I am not an expert on cryptography. These are some simple tests that I've put together based on my tiny knowledge of the subject. If anyone can provide more information that I can use to validate my claims I would love to hear it.

Edit: After talking to someone that knows more about the subject than I do (though still not an expert) I have a better understanding of the problem. Many encryption algorithms strive to produce output that appears to be random data. This reduces the ability of an attacker to perform a statistical analysis of the ciphertext. This is the reason that well-encrypted data does not compress well, because there is very little redundancy that can be compressed. So the fact that my ciphertext does compress well does not necessarily mean that the encryption algorithm is poor, it just means that it is more vulnerable to statistical analysis than another.

No comments: