Checking file type with django form: 'application/octet

I'm using django validators and python-magic to check the mime type of uploaded documents and accept only pdf, zip and rar files.

Accepted mime-types are: 'application/pdf', 'application/zip', 'multipart/x-zip', 'application/x-zip-compressed', 'application/x-compressed', 'application/rar', 'application/x-rar' 'application/x-rar-compressed', 'compressed/rar',

The problem is that sometimes pdf files seem to have 'application/octet-stream' as mime-type. 'application/octet-stream' means generic binary file, so I can't simply add that mime type to the list of accepted files, because in that case also other files such es excel files would be accepted, and I don't want that to happen.

How can I do in this case?

Thanks in advance.


You should not rely on the MIME type provided, but rather the MIME type discovered from the first few bytes of the file itself.

This will help eliminate the generic MIME type issue.

The problem with this approach is that it will usually rely on some third party tool (for example the file command commonly found on Linux systems is great; use it with -b --mime - and pass in the first few bytes of your file to have it give you the mime type).

The other option you have is to accept the file, and try to validate it by opening it with a library.

So if pypdf cannot open the file, and the built-in zip module cannot open the file, and rarfile cannot open the file - its most likely something that you don't want to accept.


The most fool proof way of telling is by snooping into the file contents by reading its metadata in the file header.

In most files, this file header is usually stored at the beginning of the file, though in some, it may be located in other locations.

python-magic helps you to do this, but the trick is to always reset the pointer at the beginning of the file, before trying to guess its mime type, else you will sometimes be getting appliation/octet-stream mime type if the reader's pointer has advanced past the file header location to other locations that just contains arbitrary stream of bytes.

For example, if you have a django validator function that tries to validate uploaded files for mime types:

import magic
from django.core.exceptions import ValidationError

def validate_file_type(upload):
    allowed_filetypes = [
        'application/pdf', 'image/jpeg', 'image/jpg', 'image/png',
        'application/msword']
    upload.seek(0)
    file_type = magic.from_buffer(upload.read(1024), mime=True)
    if file_type not in allowed_filetypes:
        raise ValidationError(
            'Unsupported file')
链接地址: http://www.djcxy.com/p/47032.html

上一篇: 获取微软10边缘浏览器MIME类型的PHP

下一篇: 使用django格式检查文件类型:'application / octet