Non greedy (reluctant) regex matching in sed?

I'm trying to use sed to clean up lines of URLs to extract just the domain..

So from:

http://www.suepearson.co.uk/product/174/71/3816/

I want:

http://www.suepearson.co.uk/

(either with or without the trainling slash, it doesn't matter)

I have tried:

 sed 's|(http://.*?/).*|1|'

and (escaping the non greedy quantifier)

sed 's|(http://.*?/).*|1|'

but I can not seem to get the non greedy quantifier to work, so it always ends up matching the whole string.


Neither basic nor extended Posix/GNU regex recognizes the non-greedy quantifier; you need a later regex. Fortunately, Perl regex for this context is pretty easy to get:

perl -pe 's|(http://.*?/).*|1|'

Try [^/]* instead of .*? :

sed 's|(http://[^/]*/).*|1|g'

With sed, I usually implement non-greedy search by searching for anything except the separator until the separator :

echo "http://www.suon.co.uk/product/1/7/3/" | sed -n 's;(http://[^/]*)/.*;1;p'

Output:

http://www.suon.co.uk

this is:

  • don't output -n
  • search, match pattern, replace and print s/<pattern>/<replace>/p
  • use ; search command separator instead of / to make it easier to type so s;<pattern>;<replace>;p
  • remember match between brackets ( ... ) , later accessible with 1 , 2 ...
  • match http://
  • followed by anything in brackets [] , [ab/] would mean either a or b or /
  • first ^ in [] means not , so followed by anything but the thing in the []
  • so [^/] means anything except / character
  • * is to repeat previous group so [^/]* means characters except / .
  • so far sed -n 's;(http://[^/]*) means search and remember http:// followed by any characters except / and remember what you've found
  • we want to search untill the end of domain so stop on the next / so add another / at the end: sed -n 's;(http://[^/]*)/' but we want to match the rest of the line after the domain so add .*
  • now the match remembered in group 1 ( 1 ) is the domain so replace matched line with stuff saved in group 1 and print: sed -n 's;(http://[^/]*)/.*;1;p'
  • If you want to include backslash after the domain as well, then add one more backslash in the group to remember:

    echo "http://www.suon.co.uk/product/1/7/3/" | sed -n 's;(http://[^/]*/).*;1;p'
    

    output:

    http://www.suon.co.uk/
    
    链接地址: http://www.djcxy.com/p/47124.html

    上一篇: 如何使用sed替换换行符(\ n)?

    下一篇: 非贪婪(不情愿)正则表达式在sed中匹配?