Using grep to find all emails
How to properly construct regular expression for "grep" linux program, to find all email in, say /etc directory ? Currently, my script is following:
grep -srhw "[[:alnum:]]*@[[:alnum:]]*" /etc
It working OK - a see some of the emails, but when i modify it, to catch the one-or-more charactes before- and after the "@" sign ...
grep -srhw "[[:alnum:]]+@[[:alnum:]]+" /etc
.. it stops working at all
Also, it does't catches emails of form "Name.LastName@site.com"
Help !
Here is another example
grep -Eiorh '([[:alnum:]_.-]+@[[:alnum:]_.-]+?.[[:alpha:].]{2,6})' "$@" * | sort | uniq > emails.txt
This variant works with 3 level domains.
grep
requires most of the regular expression special characters to be escaped - including +
. You'll want to do one of these two:
grep -srhw "[[:alnum:]]+@[[:alnum:]]+" /etc
egrep -srhw "[[:alnum:]]+@[[:alnum:]]+" /etc
I modified your regex to include punctuation (like .-_ etc) by changing it to
egrep -ho "[[:graph:]]+@[[:graph:]]+"
This still is pretty clean and matches... well, most anything with an @ in it, of course. Also 3rd level domains, also addresses with '%' or '+' in them. See http://www.delorie.com/gnu/docs/grep/grep_8.html for a good documentation on the character class used.
In my example, the addresses were surrounded by white space, making matching quite easy. If you grep through a mail server log for example, you can add < > to make it match only the addresses:
egrep -ho "<[[:graph:]]+@[[:graph:]]+>"
@thomas, @glowcoder and @oedo all are right. The RFC that defines how an eMail address can look is quite a fun read. (I've been using GNU grep 2.9 above, included in Ubuntu).
Also check out zpea's version below, it should make for a less trigger-happy matcher.
链接地址: http://www.djcxy.com/p/92866.html上一篇: 如何使用HTML5输入验证来验证表单输入
下一篇: 使用grep查找所有电子邮件