python csv copy column
I have a file containing following
first_name,last_name,uid,email,dep_code,dep_name
john,smith,jsmith,jsmith@gmail.com,finance,21230
john,king,jking,jjing@gmail.com,human resource,31230
I want to copy column "email" and create a new column "email2" and then replace gmail.com from column email2 to hotmail.com
I'm new to python so need help from experts, I tried few script, but if there is a better way to do it then please let me know. The original file contains 60000 rows.
with open('c:Python27scriptscolnewfile.csv', 'rb') as fp_in1, open('c:Python27scriptsfinal.csv', 'wb') as fp_out1:
writer1 = csv.writer(fp_out1, delimiter=",")
reader1 = csv.reader(fp_in1, delimiter=",")
domain = "@hotmail.com"
for row in reader1:
if row[2:3] == "uid":
writer1.append("Email2")
else:
writer1.writerow(row+[row[2:3]])
Here is the final script, only problem is that it does not complete the entire outfile, it only show 61409 rows, whereas in the input file there are 61438 rows.
inFile = 'c:Python27scriptsin-093013.csv' outFile = 'c:Python27scriptsfinal.csv'
with open(inFile, 'rb') as fp_in1, open(outFile, 'wb') as fp_out1: writer = csv.writer(fp_out1, delimiter=",") reader = csv.reader(fp_in1, delimiter=",") for col in reader: del col[6:] writer.writerow(col) headers = next(reader) writer.writerow(headers + ['email2']) for row in reader: if len(row) > 3: email = email.split('@', 1)[0] + '@hotmail.com' writer.writerow(row + [email])
If you call next()
on the reader you get one row at at a time; use that to copy over the headers. Copying the email column is easy enough:
import csv
infilename = r'c:Python27scriptscolnewfile.csv'
outfilename = r'c:Python27scriptsfinal.csv'
with open(infilename, 'rb') as fp_in, open(outfilename, 'wb') as fp_out:
reader = csv.reader(fp_in, delimiter=",")
headers = next(reader) # read first row
writer = csv.writer(fp_out, delimiter=",")
writer.writerow(headers + ['email2'])
for row in reader:
if len(row) > 3:
# make sure there are at least 4 columns
email = row[3].split('@', 1)[0] + '@hotmail.com'
writer.writerow(row + [email])
This code splits the email address on the first @
sign, takes the first part of the split and adds @hotmail.com
after it:
>>> 'example@gmail.com'.split('@', 1)[0]
'example'
>>> 'example@gmail.com'.split('@', 1)[0] + '@hotmail.com'
'example@hotmail.com'
The above produces:
first_name,last_name,uid,email,dep_code,dep_name,email2
john,smith,jsmith,jsmith@gmail.com,finance,21230,jsmith@hotmail.com
john,king,jking,jjing@gmail.com,human resource,31230,jjing@hotmail.com
for your sample input.
This can be done very cleanly using pandas . Here it goes:
In [1]: import pandas as pd
In [3]: df = pd.read_csv('your_csv_file.csv')
In [4]: def rename_email(row):
...: return row.email.replace('gmail.com', 'hotmail.com')
...:
In [5]: df['email2'] = df.apply(rename_email, axis=1)
In [6]: """axis = 1 or ‘columns’: apply function to each row"""
In [7]: df
Out[7]:
first_name last_name uid email dep_code dep_name email2
0 john smith jsmith jsmith@gmail.com finance 21230 jsmith@hotmail.com
1 john king jking jjing@gmail.com human resource 31230 jjing@hotmail.com
In [8]: df.to_csv('new_update_email_file.csv')
链接地址: http://www.djcxy.com/p/55092.html
上一篇: 使用python单个输入csv文件
下一篇: python csv复制列