How to modify text that matches a particular regular expression in Python?
I need to mark negative contexts in a sentence. The algorithm goes as follows:
Now, I have defined a regex to pick out all such occurences:
def replacenegation(text):
match=re.search(r"((b(never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint)b)|bw+n'tb)((?![.:;!?]).)*[.:;!?b]", text)
if match:
s=match.group()
print s
news=""
wlist=re.split(r"[.:;!? ]" , s)
wlist=wlist[1:]
print wlist
for w in wlist:
if w:
news=news+" "+w+"_NEG"
print news
I can detect and replace the matched group. However, I don't know how to recreate the complete sentence after this operation. Also for multiple matches, match.groups() gives me wrong output.
For example, if my input sentence is:
I don't like you at all; I should not let you know my happiest secret.
Output should be:
I don't like_NEG you_NEG at_NEG all_NEG ; I should not let_NEG you_NEG know_NEG my_NEG happiest_NEG secret_NEG .
How do I do this?
First of all you better to change the negative look-ahead (?![.:;!?]).)*
to a negated character class.
([^.:;!?]*)
Then you need to use none capture group and remove the extra ones for your negative words because you have surrounded it by 3 capture group, it will returns 3 match of your negative words like not
. then you can use re.findall()
to find all the matches:
>>> regex =re.compile(r"((?:never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint)b|bw+n'tb)([^.:;!?]*)([.:;!?b])")
>>>
>>> regex.findall(s)
[("don't", ' like you at all', ';'), ('not', ' let you know my happiest secret', '.')]
Or for replacing the words you can use re.sub
with a lambda function as the replacer:
>>> regex.sub(lambda x:x.group(1)+' '+' '.join([i+'_NEG' for i in x.group(2).split()])+x.group(3) ,s)
"I don't like_NEG you_NEG at_NEG all_NEG; I should not let_NEG you_NEG know_NEG my_NEG happiest_NEG secret_NEG."
Note that for capturing the punctuation you need to put it to a capture group too. Then you can add it at the end of your sentences in re.sub()
after edit.
上一篇: 分区问题蛮力算法