How to append a dictionary to a pandas dataframe?

I have a set of urls containing json files and an empty pandas dataframe with columns representing the attributes of the jsnon files. Not all json files have all the attributes in the pandas dataframe. What I need to do is to create dictionaries out of the json files and then append each dictionary to the pandas dataframe as a new row and, in case the json file doesn't have an attribute matching a column in the dataframe this has to be filled blank.

I managed to create dictionaries as:

import urllib2
import json  

url = "https://cws01.worldstores.co.uk/api/product.php?product_sku=ULST:7BIS01CF"
data = urllib2.urlopen(url).read()
data = json.loads(data)

and then I tried to create a for loop as follows:

row = -1
for i in links:
    row = row + 1
    data = urllib2.urlopen(str(i)).read()
    data = json.loads(data)
    for key in data.keys():
        for column in df.columns:
            if str(column) == str(key):
                df.loc[[str(column)],row] = data[str(key)]
            else:
                df.loc[[str(column)],row] = None

where df is the dataframe and links is the set of urls

However, I get the following error:

raise KeyError('%s not in index' % objarr[mask])

KeyError: "['2_seater_depth_mm'] not in index"

where ['2_seater_depth_mm'] is the first column of the pandas dataframe


For me below code works:

row = -1
for i in links:
    row = row + 1
    data = urllib2.urlopen(str(i)).read()
    data = json.loads(data)
    for key in data.keys():
        df.loc[row,key] = data[key]

You have mixed order of arguments in .loc() and have one to much []


Assuming that df is empty and has the same columns as the url dictionary keys, ie

list(df)
#[u'alternate_product_code',
# u'availability',
# u'boz',
# ...

len(df)
#0

then you can use pandas.append

for url in links:
    url_data = urllib2.urlopen(str(url)).read()
    url_dict = json.loads(url_data)
    a_dict   = { k:pandas.Series([str(v)], index=[0]) for k,v in url_dict.iteritems() }
    new_df = pandas.DataFrame.from_dict(a_dict)
    df.append(new_df, ignore_index=True)

Not too sure why your code won't work, but consider the following few edits which should clean things up, should you still want to use it:

for row,url in enumerate(links):
    data      = urllib2.urlopen(str(url)).read()
    data_dict = json.loads(data)
    for key,val in data_dict.items():
        if key in list(df):
            df.ix[row,key] = val

I used enumerate to iterate over the index and value of links array, in this way you dont need an index counter ( row in your code) and then I used the .items dictionary method, so I can iterate over key and values at once. I believe pandas will automatically handle the empty dataframe entries.

链接地址: http://www.djcxy.com/p/87104.html

上一篇: 流星:异步更新订阅

下一篇: 如何将字典附加到熊猫数据框?