什么是最好的正则表达式来检查一个字符串是否是一个有效的URL?

如何检查给定的字符串是否为有效的URL地址?

我对正则表达式的了解是基本的,并且不允许我从我已经在网上看到的数百个正则表达式中进行选择。


我写了我的URL(实际上是IRI,国际化)模式,以符合RFC 3987(http://www.faqs.org/rfcs/rfc3987.html)。 这些都是PCRE语法。

对于绝对IRI(国际化):

/^[a-z](?:[-a-z0-9+.])*:(?://(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:])*@)?(?:[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9._~!$&'()*+,;=:]+)]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=@])*)(?::[0-9]*)?(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*|/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))+)(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))+)(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])))(?:?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])|[x{E000}-x{F8FF}x{F0000}-x{FFFFD}x{100000}-x{10FFFD}/?])*)?(?:#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])|[/?])*)?$/i

还要允许相关IRI:

/^(?:[a-z](?:[-a-z0-9+.])*:(?://(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:])*@)?(?:[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9._~!$&'()*+,;=:]+)]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=@])*)(?::[0-9]*)?(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*|/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))+)(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))+)(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])))(?:?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])|[x{E000}-x{F8FF}x{F0000}-x{FFFFD}x{100000}-x{10FFFD}/?])*)?(?:#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])|[/?])*)?|(?://(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:])*@)?(?:[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9._~!$&'()*+,;=:]+)]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=@])*)(?::[0-9]*)?(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*|/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))+)(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*)?|(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=@])+)(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])))(?:?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])|[x{E000}-x{F8FF}x{F0000}-x{FFFFD}x{100000}-x{10FFFD}/?])*)?(?:#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])|[/?])*)?)$/i

它们是如何编译的(使用PHP):

<?php

/* Regex convenience functions (character class, non-capturing group) */
function cc($str, $suffix = '', $negate = false) {
    return '[' . ($negate ? '^' : '') . $str . ']' . $suffix;
}
function ncg($str, $suffix = '') {
    return '(?:' . $str . ')' . $suffix;
}

/* Preserved from RFC3986 */

$ALPHA = 'a-z';
$DIGIT = '0-9';
$HEXDIG = $DIGIT . 'a-f';

$sub_delims = '!$&'()*+,;=';
$gen_delims = ':/?#[]@';
$reserved = $gen_delims . $sub_delims;
$unreserved = '-' . $ALPHA . $DIGIT . '._~';

$pct_encoded = '%' . cc($HEXDIG) . cc($HEXDIG);

$dec_octet = ncg(implode('|', array(
    cc($DIGIT),
    cc('1-9') . cc($DIGIT),
    '1' . cc($DIGIT) . cc($DIGIT),
    '2' . cc('0-4') . cc($DIGIT),
    '25' . cc('0-5')
)));

$IPv4address = $dec_octet . ncg('.' . $dec_octet, '{3}');

$h16 = cc($HEXDIG, '{1,4}');
$ls32 = ncg($h16 . ':' . $h16 . '|' . $IPv4address);

$IPv6address = ncg(implode('|', array(
    ncg($h16 . ':', '{6}') . $ls32,
    '::' . ncg($h16 . ':', '{5}') . $ls32,
    ncg($h16, '?') . '::' . ncg($h16 . ':', '{4}') . $ls32,
    ncg($h16 . ':' . $h16, '?') . '::' . ncg($h16 . ':', '{3}') . $ls32,
    ncg(ncg($h16 . ':', '{0,2}') . $h16, '?') . '::' . ncg($h16 . ':', '{2}') . $ls32,
    ncg(ncg($h16 . ':', '{0,3}') . $h16, '?') . '::' . $h16 . ':' . $ls32,
    ncg(ncg($h16 . ':', '{0,4}') . $h16, '?') . '::' . $ls32,
    ncg(ncg($h16 . ':', '{0,5}') . $h16, '?') . '::' . $h16,
    ncg(ncg($h16 . ':', '{0,6}') . $h16, '?') . '::',
)));

$IPvFuture = 'v' . cc($HEXDIG, '+') . cc($unreserved . $sub_delims . ':', '+');

$IP_literal = '[' . ncg(implode('|', array($IPv6address, $IPvFuture))) . ']';

$port = cc($DIGIT, '*');

$scheme = cc($ALPHA) . ncg(cc('-' . $ALPHA . $DIGIT . '+.'), '*');

/* New or changed in RFC3987 */

$iprivate = 'x{E000}-x{F8FF}x{F0000}-x{FFFFD}x{100000}-x{10FFFD}';

$ucschar = 'x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}' .
    'x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}' .
    'x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}' .
    'x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}' .
    'x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}' .
    'x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}';

$iunreserved = '-' . $ALPHA . $DIGIT . '._~' . $ucschar;

$ipchar = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . ':@'));

$ifragment = ncg($ipchar . '|' . cc('/?'), '*');

$iquery = ncg($ipchar . '|' . cc($iprivate . '/?'), '*');

$isegment_nz_nc = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . '@'), '+');
$isegment_nz = ncg($ipchar, '+');
$isegment = ncg($ipchar, '*');

$ipath_empty = '(?!' . $ipchar . ')';
$ipath_rootless = ncg($isegment_nz) . ncg('/' . $isegment, '*');
$ipath_noscheme = ncg($isegment_nz_nc) . ncg('/' . $isegment, '*');
$ipath_absolute = '/' . ncg($ipath_rootless, '?'); // Spec says isegment-nz *( "/" isegment )
$ipath_abempty = ncg('/' . $isegment, '*');

$ipath = ncg(implode('|', array(
    $ipath_abempty,
    $ipath_absolute,
    $ipath_noscheme,
    $ipath_rootless,
    $ipath_empty
))) . ')';

$ireg_name = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . '@'), '*');

$ihost = ncg(implode('|', array($IP_literal, $IPv4address, $ireg_name)));
$iuserinfo = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . ':'), '*');
$iauthority = ncg($iuserinfo . '@', '?') . $ihost . ncg(':' . $port, '?');

$irelative_part = ncg(implode('|', array(
    '//' . $iauthority . $ipath_abempty . '',
    '' . $ipath_absolute . '',
    '' . $ipath_noscheme . '',
    '' . $ipath_empty . ''
)));

$irelative_ref = $irelative_part . ncg('?' . $iquery, '?') . ncg('#' . $ifragment, '?');

$ihier_part = ncg(implode('|', array(
    '//' . $iauthority . $ipath_abempty . '',
    '' . $ipath_absolute . '',
    '' . $ipath_rootless . '',
    '' . $ipath_empty . ''
)));

$absolute_IRI = $scheme . ':' . $ihier_part . ncg('?' . $iquery, '?');

$IRI = $scheme . ':' . $ihier_part . ncg('?' . $iquery, '?') . ncg('#' . $ifragment, '?');

$IRI_reference = ncg($IRI . '|' . $irelative_ref);

编辑2011年3月7日:由于PHP在引用字符串中处理反斜杠的方式,因此默认情况下它们不可用。 除了反斜杠在正则表达式中有特殊含义的地方,你需要双重转义反斜杠。 你可以这样做:

$escape_backslash = '/(?<!)(?![[]^$.|*+()QEnrtaefvdwsDWSbAZzB1-9GX]|x{[0-9a-f]{1,4}}|c[A-Z]|)/';
$absolute_IRI = preg_replace($escape_backslash, '\', $absolute_IRI);
$IRI = preg_replace($escape_backslash, '\', $IRI);
$IRI_reference = preg_replace($escape_backslash, '\', $IRI_reference);

我刚写了一篇博客文章,提供了一个很好的解决方案,用于识别大多数使用的格式中的URL,例如:

  • www.google.com
  • http://www.google.com
  • mailto:somebody@google.com
  • somebody@google.com
  • www.url-with-querystring.com/?url=has-querystring
  • 使用的正则表达式是:

    /((([A-Za-z]{3,9}:(?://)?)(?:[-;:&=+$,w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=+$,w]+@)[A-Za-z0-9.-]+)((?:/[+~%/.w-_]*)???(?:[-+=&;%@.w_]*)#?(?:[w]*))?)/
    

    但是,我建议你去http://blog.mattheworiordan.com/post/13174566389/url-regular-expression-for-links-with-or-without-the查看工作示例。


    什么平台? 如果使用.NET,请使用System.Uri.TryCreate ,而不是正则表达式。

    例如:

    static bool IsValidUrl(string urlString)
    {
        Uri uri;
        return Uri.TryCreate(urlString, UriKind.Absolute, out uri)
            && (uri.Scheme == Uri.UriSchemeHttp
             || uri.Scheme == Uri.UriSchemeHttps
             || uri.Scheme == Uri.UriSchemeFtp
             || uri.Scheme == Uri.UriSchemeMailto
                /*...*/);
    }
    
    // In test fixture...
    
    [Test]
    void IsValidUrl_Test()
    {
        Assert.True(IsValidUrl("http://www.example.com"));
        Assert.False(IsValidUrl("javascript:alert('xss')"));
        Assert.False(IsValidUrl(""));
        Assert.False(IsValidUrl(null));
    }
    

    (感谢@Yoshi关于javascript:的提示:)

    链接地址: http://www.djcxy.com/p/13391.html

    上一篇: What is the best regular expression to check if a string is a valid URL?

    下一篇: Regular expression to avoid a given set of substrings