single entry of csv file using python
Been searching hi and low for an answer to what seems to be a really simple question, but can't find it and was hoping someone can help.
I'm trying to use python to compare a field within a row of a csv file against a master list of codes and add to a tally of times it appears. like using the countif() and vlookup() functions in excel.
The idea is that when the relevant file arrives in my directory, I can run this script rather than having to open excel and do the work manually.
Trouble is that I can not seem to extract the fields "on their own" to use with if statements or the like.
The code below highlights the direction I've been trying to take.
First I tried the CSV module. It successfully retrieves a line from the file - but then I can't get out a particular field using the ".split(",")[n] method as it returns single letters, not entire fields (and I have no idea why).
Even if I could return the field, there are commas within some of the data fields of the CSV so effectively there are no fixed field numbers for this method posing a problem. (I tried it by converting it to .txt in plain text which worked but was not "workable")
Next I tried pandas, but I could not manage to get the rows out, only the columns as a whole, so the second print statement, rather than printing line 2, prints the second instance of the whole data set of columns
What I actually want is to be able to get (say) line 5 column 4 as a string/number as appropriate.
Any help greatly appreciated
os.system('clear')
identifier = []
data_line1 = []
data_line2=[]
data = []
io = []
path = '/home/data'
newest = max(glob.iglob('/home/data/*.csv'), key=os.path.getmtime) #assigns the name of the newest file in the dir to newest.
for i in range(0,5):
with open(newest, 'rtU') as file:
line1 = list(file)[i]
data_line1.append(line1)
line2 = pandas.read_csv(newest,sep=",", usecols=(4,5,16),header=1)
data_line2.append(line2)
print "###################################################################################################"
print data_line1[4]
print "###################################################################################################"
print data_line2[1]
a sample data set is (all rows are the same in this sorry, couldn't use real data because it ain't mine):
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname, name",0000-6677-009899-09,572011,knockout
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname, name",0000-6677-009899-09,572011,knockout
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname, name",0000-6677-009899-09,572011,knockout
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname, name",0000-6677-009899-09,572011,knockout
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname, name",0000-6677-009899-09,572011,knockout
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname, name",0000-6677-009899-09,572011,knockout
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname, name",0000-6677-009899-09,572011,knockout
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname, name",0000-6677-009899-09,572011,knockout
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname, name",0000-6677-009899-09,572011,knockout
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname, name",0000-6677-009899-09,572011,knockout
and the output of the above script showing either the row (for method 1) or the instance of the data group corresponding to the list number(for method 2):
###################################################################################################
31/07/14 17:44,Standard P,727013,,,1002821,some info in here,a thing here,35,4.93,172.55,0,another thing here,some stuff here,a place here,"surname, name",0000-6677-009899-09,572011,knockout
###################################################################################################
Unnamed: 4 1002821 0000-6677-009899-09
0 NaN 1002821 0000-6677-009899-09
1 NaN 1002821 0000-6677-009899-09
2 NaN 1002821 0000-6677-009899-09
3 NaN 1002821 0000-6677-009899-09
4 NaN 1002821 0000-6677-009899-09
5 NaN 1002821 0000-6677-009899-09
6 NaN 1002821 0000-6677-009899-09
7 NaN 1002821 0000-6677-009899-09
I'm desperate for a way to get both peices of the puzzle to fit....
The real file in question has a heap of entries, so it would be awesome if someone can point me in the direction of what I'm doing wrong :)
ps newbie to python and doing things like this to help me learn, so apologies if the answer really simple.....
链接地址: http://www.djcxy.com/p/55094.html上一篇: “大”数据csv从2个文件中搜索
下一篇: 使用python单个输入csv文件