Python，将json / dictionary对象迭代地写入一个文件（一次一个）

2018-07-02 20:12:17

我有一个很大的for loop ，我在其中创建json对象，我希望能够将每次迭代中的对象写入一个文件。我希望稍后能够以类似的方式使用该文件（一次读取一个对象）。我的json对象包含换行符，我不能将每个对象转储为文件中的一行。我怎样才能做到这一点？

为了使其更具体，请考虑以下内容：

for _id in collection:
    dict_obj = build_dict(_id)  # build a dictionary object 
    with open('file.json', 'a') as f:
        stream_dump(dict_obj, f)

stream_dump是我想要的功能。

请注意，我不想创建一个大的列表并使用诸如json.dump(obj, file)类的东西来转储整个列表。我希望能够在每次迭代中将对象追加到文件中。

谢谢。

您需要使用JSONEncoder的子类，然后代理build_dict函数

from __future__ import (absolute_import, division, print_function,)
#                        unicode_literals)

import collections
import json


mycollection = [1, 2, 3, 4]


def build_dict(_id):
    d = dict()
    d['my_' + str(_id)] = _id
    return d


class SeqProxy(collections.Sequence):
    def __init__(self, func, coll, *args, **kwargs):
        super(SeqProxy, *args, **kwargs)

        self.func = func
        self.coll = coll

    def __len__(self):
        return len(self.coll)

    def __getitem__(self, key):
        return self.func(self.coll[key])


class JsonEncoderProxy(json.JSONEncoder):
    def default(self, o):
        try:
            iterable = iter(o)
        except TypeError:
            pass
        else:
            return list(iterable)
        # Let the base class default method raise the TypeError
        return json.JSONEncoder.default(self, o)


jsonencoder = JsonEncoderProxy()
collproxy = SeqProxy(build_dict, mycollection)


for chunk in jsonencoder.iterencode(collproxy):
    print(chunk)

输出继电器：

[
{
"my_1"
:
1
}
,
{
"my_2"
:
2
}
,
{
"my_3"
:
3
}
,
{
"my_4"
:
4
}
]

要按块读取它，需要使用JSONDecoder并将可调用object_hook传递给object_hook 。当您调用JSONDecoder.decode(json_string)时，将使用每个新解码对象（列表中的每个dict ）调用JSONDecoder.decode(json_string)

既然你自己生成文件，你可以简单地写出每行一个JSON对象：

for _id in collection:
    dict_obj = build_dict(_id)  # build a dictionary object 
    with open('file.json', 'a') as f:
        f.write(json.dumps(dict_obj))
        f.write('n')

然后通过遍历行读取它们：

with open('file.json', 'r') as f:
    for line in f:
        dict_obj = json.loads(line)

这不是一个很好的通用解决方案，但如果你既是生成器又是消费者，这是一个简单的解决方案。

最简单的解决方案

从你的json文档中删除所有的空白字符：

import string

def remove_whitespaces(txt):
    """ We shall remove all whitespaces"""
    for chr in string.whitespace:
        txt = txt.replace(chr)

显然你也可以json.dumps(json.loads(json_txt)) （顺便说一句，这也验证了文本是一个有效的json）。

现在，您可以将文档写入文件，每行一个文件。

第二种方案

创建一个[AnyStr] Io流，在Io中写入一个有效的文档（您的文档是对象或列表的一部分），然后将io写入文件（或将其上载到云中）。

链接地址: http://www.djcxy.com/p/91531.html

上一篇: Python, write json / dictionary objects to a file iteratively (one at a time)

下一篇: Per pixel softmax for fully convolutional network