single entry of csv file using python

Been searching hi and low for an answer to what seems to be a really simple question, but can't find it and was hoping someone can help.

I'm trying to use python to compare a field within a row of a csv file against a master list of codes and add to a tally of times it appears. like using the countif() and vlookup() functions in excel.

The idea is that when the relevant file arrives in my directory, I can run this script rather than having to open excel and do the work manually.

Trouble is that I can not seem to extract the fields "on their own" to use with if statements or the like.

The code below highlights the direction I've been trying to take.

First I tried the CSV module. It successfully retrieves a line from the file - but then I can't get out a particular field using the ".split(",")[n] method as it returns single letters, not entire fields (and I have no idea why).

Even if I could return the field, there are commas within some of the data fields of the CSV so effectively there are no fixed field numbers for this method posing a problem. (I tried it by converting it to .txt in plain text which worked but was not "workable")

Next I tried pandas, but I could not manage to get the rows out, only the columns as a whole, so the second print statement, rather than printing line 2, prints the second instance of the whole data set of columns

What I actually want is to be able to get (say) line 5 column 4 as a string/number as appropriate.

Any help greatly appreciated

os.system('clear')
identifier = []
data_line1 = []
data_line2=[]
data = []
io = []

path = '/home/data'
newest = max(glob.iglob('/home/data/*.csv'), key=os.path.getmtime) #assigns the name of the newest file in the dir to newest.

for i in range(0,5):
    with open(newest, 'rtU') as file:
        line1 = list(file)[i]
        data_line1.append(line1)

        line2 = pandas.read_csv(newest,sep=",", usecols=(4,5,16),header=1)
        data_line2.append(line2)

print "###################################################################################################"
print  data_line1[4]

print "###################################################################################################"
print data_line2[1]

a sample data set is (all rows are the same in this sorry, couldn't use real data because it ain't mine):

31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname,  name",0000-6677-009899-09,572011,knockout
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname,  name",0000-6677-009899-09,572011,knockout
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname,  name",0000-6677-009899-09,572011,knockout
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname,  name",0000-6677-009899-09,572011,knockout
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname,  name",0000-6677-009899-09,572011,knockout
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname,  name",0000-6677-009899-09,572011,knockout
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname,  name",0000-6677-009899-09,572011,knockout
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname,  name",0000-6677-009899-09,572011,knockout
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname,  name",0000-6677-009899-09,572011,knockout
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname,  name",0000-6677-009899-09,572011,knockout

and the output of the above script showing either the row (for method 1) or the instance of the data group corresponding to the list number(for method 2):

###################################################################################################
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname,  name",0000-6677-009899-09,572011,knockout

###################################################################################################
   Unnamed: 4  1002821  0000-6677-009899-09
0         NaN  1002821  0000-6677-009899-09
1         NaN  1002821  0000-6677-009899-09
2         NaN  1002821  0000-6677-009899-09
3         NaN  1002821  0000-6677-009899-09
4         NaN  1002821  0000-6677-009899-09
5         NaN  1002821  0000-6677-009899-09
6         NaN  1002821  0000-6677-009899-09
7         NaN  1002821  0000-6677-009899-09

I'm desperate for a way to get both peices of the puzzle to fit....

The real file in question has a heap of entries, so it would be awesome if someone can point me in the direction of what I'm doing wrong :)

ps newbie to python and doing things like this to help me learn, so apologies if the answer really simple.....

链接地址: http://www.djcxy.com/p/55094.html

上一篇: “大”数据csv从2个文件中搜索

下一篇: 使用python单个输入csv文件