Binary Data in JSON String. Something better than Base64

The JSON format natively doesn't support binary data. The binary data has to be escaped so that it can be placed into a string element (ie zero or more Unicode chars in double quotes using backslash escapes) in JSON.

An obvious method to escape binary data is to use Base64. However, Base64 has a high processing overhead. Also it expands 3 bytes into 4 characters which leads to an increased data size by around 33%.

One use case for this is the v0.8 draft of the CDMI cloud storage API specification. You create data objects via a REST-Webservice using JSON, eg

PUT /MyContainer/BinaryObject HTTP/1.1
Host: cloud.example.com
Accept: application/vnd.org.snia.cdmi.dataobject+json
Content-Type: application/vnd.org.snia.cdmi.dataobject+json
X-CDMI-Specification-Version: 1.0
{
    "mimetype" : "application/octet-stream",
    "metadata" : [ ],
    "value" :   "TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz
    IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg
    dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu
    dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo
    ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=",
}

Are there better ways and standard methods to encode binary data into JSON strings?


There are 94 Unicode characters which can be represented as one byte according to the JSON spec (if your JSON is transmitted as UTF-8). With that in mind, I think the best you can do space-wise is base85 which represents four bytes as five characters. However, this is only a 7% improvement over base64, it's more expensive to compute, and implementations are less common than for base64 so it's probably not a win.

You could also simply map every input byte to the corresponding character in U+0000-U+00FF, then do the minimum encoding required by the JSON standard to pass those characters; the advantage here is that the required decoding is nil beyond builtin functions, but the space efficiency is bad -- a 105% expansion (if all input bytes are equally likely) vs. 25% for base85 or 33% for base64.

Final verdict: base64 wins, in my opinion, on the grounds that it's common, easy, and not bad enough to warrant replacement.


I know this is a nearly 6 year old question but I run into the same problem, and thought I'd share a solution: multipart/form-data.

By sending a multipart form you send first as string your JSON meta-data, and then separately send as raw binary (image(s), wavs, etc) indexed by the Content-Disposition name.

Here's a nice tutorial on how to do this in obj-c, and here is a blog article that explains how to partition the string data with the form boundary, and separate it from the binary data.

The only change you really need to do is on the server side; you will have to capture your meta-data which should reference the POST'ed binary data appropriately (by using a Content-Disposition boundary).

Granted it requires additional work on the server side, but if you are sending many images or large images, this is worth it. Combine this with gzip compression if you want.

IMHO sending base64 encoded data is a hack; the RFC multipart/form-data was created for issues such as this: sending binary data in combination with text or meta-data.


BSON (Binary JSON) may work for you. http://en.wikipedia.org/wiki/BSON

Edit: FYI the .NET library json.net supports reading and writing bson if you are looking for some C# server side love.

链接地址: http://www.djcxy.com/p/3770.html

上一篇: 如何编码内容的文件名参数

下一篇: JSON字符串中的二进制数据。 比Base64更好的东西