Read specific sequence of lines in Python

2018-06-24 23:44:11

I have a sample file that looks like this:

    @XXXXXXXXX
    VXVXVXVXVX
    +
    ZZZZZZZZZZZ
    @AAAAAA
    YBYBYBYBYBYBYB
    ZZZZZZZZZZZZ
    ...

I wish to only read the lines that fall on the index 4i+2, where i starts at 0. So I should read the VXVXV (4*0+2 = 2)... line and the YBYB...(4*1 +2 = 6) line in the snippet above. I need to count the number of 'V's, 'X's,'Y's and 'B's and store in a pre-existing dict.

fp = open(fileName, "r")
lines = fp.readlines()

for i in xrange(1, len(lines),4):
    for c in str(lines(i)):
        if c == 'V':
             some_dict['V'] +=1

Can someone explain how do I avoid going off index and only read in the lines at the 4*i+2 index of the lines list?

Can't you just slice the list of lines?

lines = fp.readlines()
interesting_lines = lines[2::4]

Edit for others questioning how it works:

The "full" slice syntax is three parts: start:end:step

The start is the starting index, or 0 by default. Thus, for a 4 * i + 2, when i == 0, that is index #2.

The end is the ending index, or len(sequence) by default. Slices go up to but not including the last index.

The step is the increment between chosen items, 1 by default. Normally, a slice like 3:7 would return elements 3,4,5,6 (and not 7). But when you add a step parameter, you can do things like "step by 4".

Doing "step by 4" means start+0, start+4, start+8, start+12, ... which is what the OP wants, so long as the start parameter is chosen correctly.

You can do one of the following:

Start xrange at 0 then add 2 onto i in secondary loop

for i in xrange(0, len(lines), 4):
    for c in str(lines(i+2))
        if c == 'V':
            some_dict['V'] += 1

Start xrange at 2, then access i the way specified in your original program

for i in xrange(2, len(lines), 4):
    for c in str(lines(i))
        if c == 'V':
            some_dict['V'] += 1

I'm not quite clear on what you're trying to do here--- are you actually just trying to only read the lines you want from disk? (In which case you've gone wrong from the start, because readlines() reads the whole file.) Or are you just trying to filter the list of lines to pick out the ones you want?

I'll assume the latter. In which case, the easiest thing to do would be to just use a listcomp to filter the line by indices. eg something simple like:

indices = [x[0] * 4 + 2 for x in enumerate(lines)]
filtered_lines = [lines[i] for i in indices if len(lines) > i]

and there you go, you've got just the lines you want, no index errors or anything silly like that. Then you can separate out and simplify the rest of your code to do the counting, just operating on the filtered list.

(just slightly edited the first list comp to be a little more idiomatic)

链接地址: http://www.djcxy.com/p/69968.html

上一篇: 在Python中展开浅层列表

下一篇: 在Python中读取特定的行序列