Why is my docx, xlsx, pptx file corrupted?

PROBLEM :

I need files on my server to be encrypted and it works perfectly fine for .txt, .doc, .xls, .ppt but not with .docx, .xlsx and .pptx.

The problem when I try to edit a docx (or xlsx, pptx) is that the file gets corrupted by the way I encrypt/decrypt since it's not a proper way to edit a docx. So when Microsoft Word tries to open it, it says it's corrupted and it opens it as 'Document1.docx' and not as'MyFileName.docx' and when saving I have to give the name again and with pptx I even have to give the path to the webdav folder the document is in.

QUESTION :

Is there any way to get it to save in the right place without having to type the path ?

CODE :

Here is the code I use to encrypt the files :

$ext = explode( '.', basename($path));
if (in_array("doc", $ext) || in_array("docx", $ext)) {
    $handle = fopen("$davPath/$path", "rb");
    $data_file = fread($handle, filesize("$davPath/$path"));
    fclose($handle);
} else {            
    $data_file = file_get_contents("$davPath/$path");
}

$encrypt_data_file = $encryption->encrypt($data_file);

if (file_put_contents("$davPath/encrypt_" . basename($path),$encrypt_data_file)) {
    unlink("$davPath/" . basename($path));
    rename("$davPath/encrypt_" . basename($path),"$davPath/" . basename($path));
    return true;
} else {
    return false;
}

And here is the code I use to decrypt them :

$ext = explode( '.', basename($uri));
if(is_file($davPath."/".$uri)) {
    if (in_array("doc", $ext) || in_array("docx", $ext)) {
        $handle = fopen("$davPath/$uri", "rb");
        $data_file = fread($handle, filesize("$davPath/$uri"));
        fclose($handle);
    } else {
        $data_file = file_get_contents("$davPath/$uri");
    }   
}
if ($data_file != false) {
    $decrypt_data_file = $encryption->decrypt($data_file);

    header('Content-Description: File Transfer');
    header('Content-Type: application/octet-stream');
    header('Content-Disposition: attachment; filename='.basename($uri));
    header('Content-Location: '.$_SERVER['SCRIPT_URI']);
    header('Expires: 0');
    header('Cache-Control: must-revalidate');
    header('Pragma: public');
    ob_clean();
    flush();
    echo $decrypt_data_file;
    return false;
}

PS : I did find a workaround which consists in having the file decrypted on the server during the modification but I would really like not to have to do that.


Your issue has been solved, but I'd like to add an answer to it.

When you have a corrupted docx, here are some steps to find out what's wrong :

First, try to unzip the zip. If it does work, your problem is with the content of the docx. If the unzip doesn't work, your zip seems to be corrupted

Problems with the content of the docx

When you open the docx, word will probably tell you where the problem lies, if the zip is not corrupted.

It will tell you for example: Parse error on line 213 of document.xml

Here's the "normal" structure of a docx, after unzipped.

+--docProps
|  +  app.xml
|    core.xml
+  res.log
+--word //this folder contains most of the files that control the content of the document
|  +  document.xml //Is the actual content of the document
|  +  endnotes.xml
|  +  fontTable.xml
|  +  footer1.xml //Containst the elements in the footer of the document
|  +  footnotes.xml
|  +--media //This folder contains all images embedded in the word
|  |    image1.jpeg
|  +  settings.xml
|  +  styles.xml
|  +  stylesWithEffects.xml
|  +--theme
|  |    theme1.xml
|  +  webSettings.xml
|  --_rels
|       document.xml.rels //this document tells word where the images are situated
+  [Content_Types].xml
--_rels
     .rels

As shown in the docx tag wiki.

Corrupted zip

If the zip is corrupted, in most of the cases, they are some characters at the beginning or at the end of the file that shouldn't be there (or that should and are not).

The best is to have a valid docx of the same document, and use the hexadecimal representation of both the documents to see what's the difference.

I usually use the hexdiff tool for this (apt-get install hexdiff).

This will usually show you where the extra characters are situated.

Quite often, the problem is that you have the wrong headers.


Thanks to edi9999 suggestion, I used a hex editor to look differences between not encrypted/decrypted docx and encrypted/decrypted one.

The only difference is at the end of the first one (not corrupted) there are 3 times '00' that are not in the corrupted one.

The solution for not having a corrupted docx was to add 3 times "" to the end of my decrypted data. And now it works perfectly fine !

For docx and pptx it's 3 times "" and for xlsx it's 4 times.

链接地址: http://www.djcxy.com/p/7950.html

上一篇: PHP文件上传finfo MIME类型检测不正确?

下一篇: 为什么我的docx,xl​​sx,pptx文件损坏?