How to modify text that matches a particular regular expression in Python?

2018-07-02 02:45:58

I need to mark negative contexts in a sentence. The algorithm goes as follows:

Detect a negator (not/never/ain't/don't/ etc)

Detect a clause ending punctuation (.;:!?)

Add _NEG to all the words in between this.

Now, I have defined a regex to pick out all such occurences:

def replacenegation(text):
    match=re.search(r"((b(never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint)b)|bw+n'tb)((?![.:;!?]).)*[.:;!?b]", text)
    if match:
        s=match.group()
        print s
        news=""
        wlist=re.split(r"[.:;!? ]" , s)
        wlist=wlist[1:]
        print wlist
        for w in wlist:
            if w:
                news=news+" "+w+"_NEG"
        print news

I can detect and replace the matched group. However, I don't know how to recreate the complete sentence after this operation. Also for multiple matches, match.groups() gives me wrong output.

For example, if my input sentence is:

I don't like you at all; I should not let you know my happiest secret.

Output should be:

I don't like_NEG you_NEG at_NEG all_NEG ; I should not let_NEG you_NEG know_NEG my_NEG happiest_NEG secret_NEG .

How do I do this?

First of all you better to change the negative look-ahead (?![.:;!?]).)* to a negated character class.

([^.:;!?]*)

Then you need to use none capture group and remove the extra ones for your negative words because you have surrounded it by 3 capture group, it will returns 3 match of your negative words like not . then you can use re.findall() to find all the matches:

>>> regex =re.compile(r"((?:never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint)b|bw+n'tb)([^.:;!?]*)([.:;!?b])")
>>> 
>>> regex.findall(s)
[("don't", ' like you at all', ';'), ('not', ' let you know my happiest secret', '.')]

Or for replacing the words you can use re.sub with a lambda function as the replacer:

>>> regex.sub(lambda x:x.group(1)+' '+' '.join([i+'_NEG' for i in x.group(2).split()])+x.group(3) ,s)
"I don't like_NEG you_NEG at_NEG all_NEG; I should not let_NEG you_NEG know_NEG my_NEG happiest_NEG secret_NEG."

Note that for capturing the punctuation you need to put it to a capture group too. Then you can add it at the end of your sentences in re.sub() after edit.

链接地址: http://www.djcxy.com/p/89528.html

上一篇: 分区问题蛮力算法

下一篇: 如何修改与Python中特定正则表达式匹配的文本？