In Python, how do you get the content

Possible Duplicate:
How to find the mime type of a file in python?

I'm using an email processing API (sendgrid.com) that posts all incoming emails to a web request handler in my app. The attachments are posted as attachment0=xyz&attachment1=abc along with other email fields like 'to' 'cc' 'subject', etc...

I then store these attachments as files in the BlobStore (with App Engine). To serve these files back to the user, the mime_type/content_type must be specified. As I understand it, it is usually dependent on the file type. But it's not clear to me how to get the file type from the passed strings.

Is there a library that figures out the file type from the byte content of a file?

Just to clarify, there is no filename or file extension. Just the file's byte content.


If you saved the filename when it was uploaded, you'd use mimetypes.guess_type function to give it a shot here. The linked SO question by Alexander is good to read.

Unfortunately, that is not your case. If all you have is a binary blob, I'm afraid you have to put on some custom heuristics here. Follow these simple steps:

  • Build a map of known signatures. I'll give an example right away.
  • Read in the first 4 bytes from the blob.
  • Do a longest matching against the map you have built in step 1. By longest matching I mean if all 4 bytes matched, take it, then try with the first 3 bytes, the first 2, and finally the first 1.
  • For example:

    ZIP file starts with two characters PK , RAR file starts with Rar! , PDF starts with %PDF , PNG starts with x89PNG and so on

    This would fail to identify some files (such as JPG) but you have a good start to build up here.

    Or alternatively, you could use https://github.com/ahupp/python-magic too.

    链接地址: http://www.djcxy.com/p/46794.html

    上一篇: 确定Python中文本的编码

    下一篇: 在Python中,你如何获取内容