Dividing a string by type of data inside each line of file in python

2018-06-14 08:56:15

It's safe to say I'm a noob in programming and python, so I really need help handling a file.

My file is a dat file with an integer, a string and a float in each line, and I need to compare the first int with other ints and the float with some other value, I need to have the values as numbers and not as part of a string to perform some mathematical operations.

This is the code I've done, but I've looked all over the googles and I can't find a function that does this:

readfile = open('file_being_read.dat').read()

def parsa_lista(file_to_read):
    converted = []
    for line in file_to_read:
       #conversion should happen here and write it to the list named "converted" 
       #my google-fu has failed me..
    return converted

print parsa_lista(readfile)

The file looks like this, but spans some 600 lines. Also, I'm going about this in a learn as I go basis, and I was really incapable of finding help, it might have something to do with the lack of some basic knowledge in data types or something.

This is the output of the list, as printed with "%r":

...
249 LEU 89.81637573242188n
250 ALA 6.454087734222412n
251 ILE 42.696006774902344n
252 VAL 39.9482421875n
253 LEU 58.06844711303711n
254 SER 6.285697937011719n
255 HIS 22.92508316040039n
256 THR 49.1857795715332n
257 ASN 15.033650398254395n
258 SER 12.086835861206055n
259 VAL 28.70435905456543n
260 VAL 39.53983688354492n
261 ASN 18.63718605041504n
262 PRO 15.275177955627441n
263 PHE 120.84526062011719n
264 ILE 26.20943260192871n
265 TYR 16.6826114654541n
266 ALA 34.382598876953125n
267 TYR 179.9381103515625n
268 ARG 77.62599182128906n
269 ILE 45.021034240722656n
270 ARG 133.72328186035156n
...

Hope you guys can help me, even some general guidelines on how I should go about this in splitting strings and comparing values will be much appreciated.

Ignacio's answer is basically completely correct, and he posted it before I even started typing. However, let me explain his two-liner in a little more detail.

Reading a file

First, a critique of your code:

readfile = open('file_being_read.dat').read()

This will read out your entire file into a giant string. When you try to iterate over this string, you will iterate over it letter by letter. Change that line to this instead:

readfile = open('file_being_read.dat')

Now, when you iterate over this file object, you'll be reading the file line-by-line.

Tokenising

You've found that iterating over a file gets you the text line-by-line. Now you need to split each line into those three values.

If the values are separated by whitespace (like your data file excerpt), Python makes this very easy with the str.split method.

>>> line
'249 LEU 89.81637573242188n'
>>> line.split()
['249', 'LEU', '89.81637573242188']

Any amount or type (tab, space) of whitespace between these values is fine. In fact, even the trailing newline gets stripped off. So now you have a list of three strings.

Interpreting

Next you need to convert the strings to integers and floating point numbers. Here, use the built-in functions int and float .

>>> vals[0]
'249'
>>> int(vals[0])
249
>>> vals[2]
'89.81637573242188'
>>> float(vals[2])
89.816375732421875

At this point, you just need to package up these values into a tuple and add them to converted .

datum = int(vals[0]), vals[1], float(vals[2])
>>> datum
(249, 'LEU', 89.816375732421875)

Why a tuple instead of a list? Lists are mutable: you can add and remove elements. This probably isn't what you need.

(You probably usually see parentheses around a tuple literal. This is one of the few times when the order of operations make them unnecessary. You can put braces around the entire right side of the assignment and it will work just fine.)

Putting it together

def parsa_lista(file_to_read):
    converted = []
    for line in file_to_read:
        vals = line.split()
        datum = int(vals[0]), vals[1], float(vals[2])
        converted.append(datum)
    return converted

vals = line.split()
converted.append((int(vals[0]), vals[1], float(vals[2])))

链接地址: http://www.djcxy.com/p/40898.html

上一篇: 在Java中实现常量的最佳方式是什么？

下一篇: 在python的每一行文件中按数据类型划分一个字符串