Non greedy (reluctant) regex matching in sed?

2018-06-16 15:40:37

I'm trying to use sed to clean up lines of URLs to extract just the domain..

So from:

http://www.suepearson.co.uk/product/174/71/3816/

I want:

http://www.suepearson.co.uk/

(either with or without the trainling slash, it doesn't matter)

I have tried:

 sed 's|(http://.*?/).*|1|'

and (escaping the non greedy quantifier)

sed 's|(http://.*?/).*|1|'

but I can not seem to get the non greedy quantifier to work, so it always ends up matching the whole string.

Neither basic nor extended Posix/GNU regex recognizes the non-greedy quantifier; you need a later regex. Fortunately, Perl regex for this context is pretty easy to get:

perl -pe 's|(http://.*?/).*|1|'

Try [^/]* instead of .*? :

sed 's|(http://[^/]*/).*|1|g'

With sed, I usually implement non-greedy search by searching for anything except the separator until the separator :

echo "http://www.suon.co.uk/product/1/7/3/" | sed -n 's;(http://[^/]*)/.*;1;p'

Output:

http://www.suon.co.uk

this is:

don't output -n

search, match pattern, replace and print s/<pattern>/<replace>/p

use ; search command separator instead of / to make it easier to type so s;<pattern>;<replace>;p

remember match between brackets ( ... ) , later accessible with 1 , 2 ...

match http://

followed by anything in brackets [] , [ab/] would mean either a or b or /

first ^ in [] means not , so followed by anything but the thing in the []

so [^/] means anything except / character

* is to repeat previous group so [^/]* means characters except / .

so far sed -n 's;(http://[^/]*) means search and remember http:// followed by any characters except / and remember what you've found

we want to search untill the end of domain so stop on the next / so add another / at the end: sed -n 's;(http://[^/]*)/' but we want to match the rest of the line after the domain so add .*

now the match remembered in group 1 ( 1 ) is the domain so replace matched line with stuff saved in group 1 and print: sed -n 's;(http://[^/]*)/.*;1;p'

If you want to include backslash after the domain as well, then add one more backslash in the group to remember:

echo "http://www.suon.co.uk/product/1/7/3/" | sed -n 's;(http://[^/]*/).*;1;p'

output:

http://www.suon.co.uk/

链接地址: http://www.djcxy.com/p/47124.html

上一篇: 如何使用sed替换换行符（\ n）？

下一篇: 非贪婪（不情愿）正则表达式在sed中匹配？