How to modify text that matches a particular regular expression in Python?

I need to mark negative contexts in a sentence. The algorithm goes as follows:

  • Detect a negator (not/never/ain't/don't/ etc)
  • Detect a clause ending punctuation (.;:!?)
  • Add _NEG to all the words in between this.
  • Now, I have defined a regex to pick out all such occurences:

    def replacenegation(text):
        match=re.search(r"((b(never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint)b)|bw+n'tb)((?![.:;!?]).)*[.:;!?b]", text)
        if match:
            s=match.group()
            print s
            news=""
            wlist=re.split(r"[.:;!? ]" , s)
            wlist=wlist[1:]
            print wlist
            for w in wlist:
                if w:
                    news=news+" "+w+"_NEG"
            print news
    

    I can detect and replace the matched group. However, I don't know how to recreate the complete sentence after this operation. Also for multiple matches, match.groups() gives me wrong output.

    For example, if my input sentence is:

    I don't like you at all; I should not let you know my happiest secret.
    

    Output should be:

    I don't like_NEG you_NEG at_NEG all_NEG ; I should not let_NEG you_NEG know_NEG my_NEG happiest_NEG secret_NEG .
    

    How do I do this?


    First of all you better to change the negative look-ahead (?![.:;!?]).)* to a negated character class.

    ([^.:;!?]*)
    

    Then you need to use none capture group and remove the extra ones for your negative words because you have surrounded it by 3 capture group, it will returns 3 match of your negative words like not . then you can use re.findall() to find all the matches:

    >>> regex =re.compile(r"((?:never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint)b|bw+n'tb)([^.:;!?]*)([.:;!?b])")
    >>> 
    >>> regex.findall(s)
    [("don't", ' like you at all', ';'), ('not', ' let you know my happiest secret', '.')]
    

    Or for replacing the words you can use re.sub with a lambda function as the replacer:

    >>> regex.sub(lambda x:x.group(1)+' '+' '.join([i+'_NEG' for i in x.group(2).split()])+x.group(3) ,s)
    "I don't like_NEG you_NEG at_NEG all_NEG; I should not let_NEG you_NEG know_NEG my_NEG happiest_NEG secret_NEG."
    

    Note that for capturing the punctuation you need to put it to a capture group too. Then you can add it at the end of your sentences in re.sub() after edit.

    链接地址: http://www.djcxy.com/p/89528.html

    上一篇: 分区问题蛮力算法

    下一篇: 如何修改与Python中特定正则表达式匹配的文本?