Grep match only specific lines, but keep context
I have a file where I'm looking for a pattern "N" on only the even-numbered lines. When a line matches, I want to keep the context -- the odd-numbered line above it.
I understand how to keep the context using -A, -B, -C but the pattern "N" will also possibly match the odd-numbered lines, so the only way I can think of solving the problem is by separating the even and odd lines before using grep, thus removing the context.
Is there a way to do this without having to extract the line numbers that have are matched with grep, and then getting those specific lines from the file after-the-fact? I suspect I might be able to do it with awk, but I'm not sure.
I'm trying to optimize code that I believe already works, because the files it will work on will be humongous and take hours to run.
I'm trying to find any of the DNA sequences that have "N"s in them, and put them in one file, and any sequences that don't have "N"s in them, and put them in another file. The ID lines can also have "N"s however. I want the ID lines to stay connected to each sequence in a line above it in the new files.
Sample Input:
>100000|NODE_2_length_277_cov_4.245487
ATCTTTTAACCCCAAAAACTCAAGTATGTGAGCCAAGTGAACATAACTGCATAAATATCAGGCTCCAAAATAATCTACTGCTTGTTGTGTAGATATAGAGCACACAATTTCTTTTTTAAAGCCCTCCCTTTCACTCTCTCTATCCCACACCCAGAAAAACTCCTATTTAGAGAAAGCCACACCTATCACTAAGAGCAAACCAACCTTTCAAAAAAAAAAAAAAAACACATTAGGAGCAAACTGTTAGGAGCCATTCAAAACCAAAGGAAATGCCAAGACACACACACACACACACACACACAC
>100001|NODE_1_length_426_cov_11.427230
AAATATATAAAAAACCTGTGTTGTGACAACAGGTTGAGAAGTAATGAGAAAATGGACGAATTAGTTCAGGATGTCTCAAAGCAGATTTCTTTCCACTTAATCTCGATGTCCTACGAAAATGCTGACTTAGGTTGTAGTTTATGTTTCTTAGATTCCAATATTTTAAAATGGCCCTTGAAATTATATTAAAAAGCTCATGAACAAGTGCATAATCAATGATAAATGAATATTTATGGTTGAGATTTGGGAATTATTAATCAATATACCTCTATACTCTTGGCTCTCTTGAAGTTTAATTCAAGTGTATTTAATTAGATTCCTACCCCAAATCAACTTTAAGAAGGCTGCTTTTCTTCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCG
Another solution with fewer keystrokes will be
awk '!(NR%2) && /N/ {print p; print}{p=$0}'
!(NR%2)
idiom is for picking even numbered lines; also keeps the previous line without any condition since will be printed only matched lines.
With awk:
seq 10 |
awk -v pattern='[26]' '
FNR % 2 == 1 {odd = $0}
FNR % 2 == 0 && $0 ~ pattern {print odd; print}
'
1
2
5
6
With your sample input:
awk '
FNR % 2 == 1 {odd = $0}
FNR % 2 == 0 {
if (/N/)
file = FILENAME ".with_N"
else
file = FILENAME ".no_N"
print odd > file
print > file
}
' myfile
链接地址: http://www.djcxy.com/p/19646.html
上一篇: 匹配之前和之后的Grep字符?
下一篇: Grep只匹配特定的行,但保持上下文