Non greedy (reluctant) regex matching in sed?
I'm trying to use sed to clean up lines of URLs to extract just the domain..
So from:
http://www.suepearson.co.uk/product/174/71/3816/
I want:
http://www.suepearson.co.uk/
(either with or without the trainling slash, it doesn't matter)
I have tried:
sed 's|(http://.*?/).*|1|'
and (escaping the non greedy quantifier)
sed 's|(http://.*?/).*|1|'
but I can not seem to get the non greedy quantifier to work, so it always ends up matching the whole string.
Neither basic nor extended Posix/GNU regex recognizes the non-greedy quantifier; you need a later regex. Fortunately, Perl regex for this context is pretty easy to get:
perl -pe 's|(http://.*?/).*|1|'
Try [^/]*
instead of .*?
:
sed 's|(http://[^/]*/).*|1|g'
With sed, I usually implement non-greedy search by searching for anything except the separator until the separator :
echo "http://www.suon.co.uk/product/1/7/3/" | sed -n 's;(http://[^/]*)/.*;1;p'
Output:
http://www.suon.co.uk
this is:
-n
s/<pattern>/<replace>/p
;
search command separator instead of /
to make it easier to type so s;<pattern>;<replace>;p
(
... )
, later accessible with 1
, 2
... http://
[]
, [ab/]
would mean either a
or b
or /
^
in []
means not
, so followed by anything but the thing in the []
[^/]
means anything except /
character *
is to repeat previous group so [^/]*
means characters except /
. sed -n 's;(http://[^/]*)
means search and remember http://
followed by any characters except /
and remember what you've found /
so add another /
at the end: sed -n 's;(http://[^/]*)/'
but we want to match the rest of the line after the domain so add .*
1
) is the domain so replace matched line with stuff saved in group 1
and print: sed -n 's;(http://[^/]*)/.*;1;p'
If you want to include backslash after the domain as well, then add one more backslash in the group to remember:
echo "http://www.suon.co.uk/product/1/7/3/" | sed -n 's;(http://[^/]*/).*;1;p'
output:
http://www.suon.co.uk/
链接地址: http://www.djcxy.com/p/47124.html
上一篇: 如何使用sed替换换行符(\ n)?