How to read, manipulate and write .docx file in c

I am reading .docx file in a buffer and writing it to a new file successfully. (Using fread and fwrite in C) However now I want to enhance the scope of this project for the purpose of encryption. For which I want to be able to manipulate the buffer, then write it in new file.

Now one question might be, what manipulation do I need? It could be anything really, like I'd write character 's' in buffer's location 15. Like below, and then write this new buffer (having character 's' at location 15, but the rest of the buffer remains unchanged) in a new .docx file.

buffer[15] = 's';

When I did this, the file that was created was corrupt. Since I am not fully aware of the structure of .docx file, this byte number 15 could be some potential identifier, or header, or any important information of .docx file needed for creating a non-corrupt file.

However, the things I know about .docx internal structure are:

  • It consists of XML files, zipped together.

  • The content that is written in .docx file, (for eg I have a file named test.docx, and it contains "Hello, how are you?") then the contents "Hello, how are you?" are stored in XML files.

  • There is a .rels (not confirm) extension file, among those files that are zipped together, that tells MS word about where the content is stored in file, ie where to look for content.

  • Apart from these 3 points I don't know much about structure of .docx file. Now considering all this, I want to be able to extract the contents of .docx file, from the XML files zipped together, read it (in C) in a buffer, change the buffer as I need it, and create a new file, with the new content that is present in the buffer.

    Can someone guide me through this? Also kindly mention, if I need to provide code, or any other essential details. Thanks in advance.

    EDIT

    PURPOSE OF ALL THIS:

    I want to do all this for encryption. As by encrypting a file (using AES) the whole file will become unreadable, corrupt and everything inside will be changed from its place. When I decrypt that file, the file is unable to open. My guess is, as AES decryption algo does not know how to parse the contents recovered from decrypting the encrypted file, in to a new .docx file, thus it is unable to place the contents/structure properly in its place.

    I have tried it. Original docx file was of 14 KB, encrypted docx file was of 14 KB as well as the decrypted docx file. But when I try to open the decrypted file, it says file is corrupt. Also I tried to check it in HEX editor. Decrypted file has just 00 bytes after exactly 30 Bytes.


    DOCX files are based on OPC and OOXML. OPC is based on Zip. OOXML is based on XML. Therefore, you can use Zip and XML tools to operate on DOCX files. Beyond this, you'll have to be more specific about what you wish to do in order to receive better guidance.

    Poking characters into random index locations in an XML file is operating at the wrong level of abstraction.

    链接地址: http://www.djcxy.com/p/45524.html

    上一篇: 键入上传的文件

    下一篇: 如何在c中读取,操作和编写.docx文件