Dividing a string by type of data inside each line of file in python
It's safe to say I'm a noob in programming and python, so I really need help handling a file.
My file is a dat file with an integer, a string and a float in each line, and I need to compare the first int with other ints and the float with some other value, I need to have the values as numbers and not as part of a string to perform some mathematical operations.
This is the code I've done, but I've looked all over the googles and I can't find a function that does this:
readfile = open('file_being_read.dat').read()
def parsa_lista(file_to_read):
converted = []
for line in file_to_read:
#conversion should happen here and write it to the list named "converted"
#my google-fu has failed me..
return converted
print parsa_lista(readfile)
The file looks like this, but spans some 600 lines. Also, I'm going about this in a learn as I go basis, and I was really incapable of finding help, it might have something to do with the lack of some basic knowledge in data types or something.
This is the output of the list, as printed with "%r":
... 249 LEU 89.81637573242188n 250 ALA 6.454087734222412n 251 ILE 42.696006774902344n 252 VAL 39.9482421875n 253 LEU 58.06844711303711n 254 SER 6.285697937011719n 255 HIS 22.92508316040039n 256 THR 49.1857795715332n 257 ASN 15.033650398254395n 258 SER 12.086835861206055n 259 VAL 28.70435905456543n 260 VAL 39.53983688354492n 261 ASN 18.63718605041504n 262 PRO 15.275177955627441n 263 PHE 120.84526062011719n 264 ILE 26.20943260192871n 265 TYR 16.6826114654541n 266 ALA 34.382598876953125n 267 TYR 179.9381103515625n 268 ARG 77.62599182128906n 269 ILE 45.021034240722656n 270 ARG 133.72328186035156n ...
Hope you guys can help me, even some general guidelines on how I should go about this in splitting strings and comparing values will be much appreciated.
Ignacio's answer is basically completely correct, and he posted it before I even started typing. However, let me explain his two-liner in a little more detail.
Reading a file
First, a critique of your code:
readfile = open('file_being_read.dat').read()
This will read out your entire file into a giant string. When you try to iterate over this string, you will iterate over it letter by letter. Change that line to this instead:
readfile = open('file_being_read.dat')
Now, when you iterate over this file object, you'll be reading the file line-by-line.
Tokenising
You've found that iterating over a file gets you the text line-by-line. Now you need to split each line into those three values.
If the values are separated by whitespace (like your data file excerpt), Python makes this very easy with the str.split
method.
>>> line
'249 LEU 89.81637573242188n'
>>> line.split()
['249', 'LEU', '89.81637573242188']
Any amount or type (tab, space) of whitespace between these values is fine. In fact, even the trailing newline gets stripped off. So now you have a list of three strings.
Interpreting
Next you need to convert the strings to integers and floating point numbers. Here, use the built-in functions int
and float
.
>>> vals[0]
'249'
>>> int(vals[0])
249
>>> vals[2]
'89.81637573242188'
>>> float(vals[2])
89.816375732421875
At this point, you just need to package up these values into a tuple and add them to converted
.
datum = int(vals[0]), vals[1], float(vals[2])
>>> datum
(249, 'LEU', 89.816375732421875)
Why a tuple instead of a list? Lists are mutable: you can add and remove elements. This probably isn't what you need.
(You probably usually see parentheses around a tuple literal. This is one of the few times when the order of operations make them unnecessary. You can put braces around the entire right side of the assignment and it will work just fine.)
Putting it together
def parsa_lista(file_to_read):
converted = []
for line in file_to_read:
vals = line.split()
datum = int(vals[0]), vals[1], float(vals[2])
converted.append(datum)
return converted
vals = line.split()
converted.append((int(vals[0]), vals[1], float(vals[2])))
链接地址: http://www.djcxy.com/p/40898.html
上一篇: 在Java中实现常量的最佳方式是什么?