什么是最好的正则表达式来检查一个字符串是否是一个有效的URL?
如何检查给定的字符串是否为有效的URL地址?
我对正则表达式的了解是基本的,并且不允许我从我已经在网上看到的数百个正则表达式中进行选择。
我写了我的URL(实际上是IRI,国际化)模式,以符合RFC 3987(http://www.faqs.org/rfcs/rfc3987.html)。 这些都是PCRE语法。
对于绝对IRI(国际化):
/^[a-z](?:[-a-z0-9+.])*:(?://(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:])*@)?(?:[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9._~!$&'()*+,;=:]+)]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=@])*)(?::[0-9]*)?(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*|/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))+)(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))+)(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])))(?:?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])|[x{E000}-x{F8FF}x{F0000}-x{FFFFD}x{100000}-x{10FFFD}/?])*)?(?:#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])|[/?])*)?$/i
还要允许相关IRI:
/^(?:[a-z](?:[-a-z0-9+.])*:(?://(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:])*@)?(?:[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9._~!$&'()*+,;=:]+)]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=@])*)(?::[0-9]*)?(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*|/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))+)(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))+)(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])))(?:?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])|[x{E000}-x{F8FF}x{F0000}-x{FFFFD}x{100000}-x{10FFFD}/?])*)?(?:#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])|[/?])*)?|(?://(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:])*@)?(?:[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9._~!$&'()*+,;=:]+)]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=@])*)(?::[0-9]*)?(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*|/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))+)(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*)?|(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=@])+)(?:/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])))(?:?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])|[x{E000}-x{F8FF}x{F0000}-x{FFFFD}x{100000}-x{10FFFD}/?])*)?(?:#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9._~x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}!$&'()*+,;=:@])|[/?])*)?)$/i
它们是如何编译的(使用PHP):
<?php
/* Regex convenience functions (character class, non-capturing group) */
function cc($str, $suffix = '', $negate = false) {
return '[' . ($negate ? '^' : '') . $str . ']' . $suffix;
}
function ncg($str, $suffix = '') {
return '(?:' . $str . ')' . $suffix;
}
/* Preserved from RFC3986 */
$ALPHA = 'a-z';
$DIGIT = '0-9';
$HEXDIG = $DIGIT . 'a-f';
$sub_delims = '!$&'()*+,;=';
$gen_delims = ':/?#[]@';
$reserved = $gen_delims . $sub_delims;
$unreserved = '-' . $ALPHA . $DIGIT . '._~';
$pct_encoded = '%' . cc($HEXDIG) . cc($HEXDIG);
$dec_octet = ncg(implode('|', array(
cc($DIGIT),
cc('1-9') . cc($DIGIT),
'1' . cc($DIGIT) . cc($DIGIT),
'2' . cc('0-4') . cc($DIGIT),
'25' . cc('0-5')
)));
$IPv4address = $dec_octet . ncg('.' . $dec_octet, '{3}');
$h16 = cc($HEXDIG, '{1,4}');
$ls32 = ncg($h16 . ':' . $h16 . '|' . $IPv4address);
$IPv6address = ncg(implode('|', array(
ncg($h16 . ':', '{6}') . $ls32,
'::' . ncg($h16 . ':', '{5}') . $ls32,
ncg($h16, '?') . '::' . ncg($h16 . ':', '{4}') . $ls32,
ncg($h16 . ':' . $h16, '?') . '::' . ncg($h16 . ':', '{3}') . $ls32,
ncg(ncg($h16 . ':', '{0,2}') . $h16, '?') . '::' . ncg($h16 . ':', '{2}') . $ls32,
ncg(ncg($h16 . ':', '{0,3}') . $h16, '?') . '::' . $h16 . ':' . $ls32,
ncg(ncg($h16 . ':', '{0,4}') . $h16, '?') . '::' . $ls32,
ncg(ncg($h16 . ':', '{0,5}') . $h16, '?') . '::' . $h16,
ncg(ncg($h16 . ':', '{0,6}') . $h16, '?') . '::',
)));
$IPvFuture = 'v' . cc($HEXDIG, '+') . cc($unreserved . $sub_delims . ':', '+');
$IP_literal = '[' . ncg(implode('|', array($IPv6address, $IPvFuture))) . ']';
$port = cc($DIGIT, '*');
$scheme = cc($ALPHA) . ncg(cc('-' . $ALPHA . $DIGIT . '+.'), '*');
/* New or changed in RFC3987 */
$iprivate = 'x{E000}-x{F8FF}x{F0000}-x{FFFFD}x{100000}-x{10FFFD}';
$ucschar = 'x{A0}-x{D7FF}x{F900}-x{FDCF}x{FDF0}-x{FFEF}' .
'x{10000}-x{1FFFD}x{20000}-x{2FFFD}x{30000}-x{3FFFD}' .
'x{40000}-x{4FFFD}x{50000}-x{5FFFD}x{60000}-x{6FFFD}' .
'x{70000}-x{7FFFD}x{80000}-x{8FFFD}x{90000}-x{9FFFD}' .
'x{A0000}-x{AFFFD}x{B0000}-x{BFFFD}x{C0000}-x{CFFFD}' .
'x{D0000}-x{DFFFD}x{E1000}-x{EFFFD}';
$iunreserved = '-' . $ALPHA . $DIGIT . '._~' . $ucschar;
$ipchar = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . ':@'));
$ifragment = ncg($ipchar . '|' . cc('/?'), '*');
$iquery = ncg($ipchar . '|' . cc($iprivate . '/?'), '*');
$isegment_nz_nc = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . '@'), '+');
$isegment_nz = ncg($ipchar, '+');
$isegment = ncg($ipchar, '*');
$ipath_empty = '(?!' . $ipchar . ')';
$ipath_rootless = ncg($isegment_nz) . ncg('/' . $isegment, '*');
$ipath_noscheme = ncg($isegment_nz_nc) . ncg('/' . $isegment, '*');
$ipath_absolute = '/' . ncg($ipath_rootless, '?'); // Spec says isegment-nz *( "/" isegment )
$ipath_abempty = ncg('/' . $isegment, '*');
$ipath = ncg(implode('|', array(
$ipath_abempty,
$ipath_absolute,
$ipath_noscheme,
$ipath_rootless,
$ipath_empty
))) . ')';
$ireg_name = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . '@'), '*');
$ihost = ncg(implode('|', array($IP_literal, $IPv4address, $ireg_name)));
$iuserinfo = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . ':'), '*');
$iauthority = ncg($iuserinfo . '@', '?') . $ihost . ncg(':' . $port, '?');
$irelative_part = ncg(implode('|', array(
'//' . $iauthority . $ipath_abempty . '',
'' . $ipath_absolute . '',
'' . $ipath_noscheme . '',
'' . $ipath_empty . ''
)));
$irelative_ref = $irelative_part . ncg('?' . $iquery, '?') . ncg('#' . $ifragment, '?');
$ihier_part = ncg(implode('|', array(
'//' . $iauthority . $ipath_abempty . '',
'' . $ipath_absolute . '',
'' . $ipath_rootless . '',
'' . $ipath_empty . ''
)));
$absolute_IRI = $scheme . ':' . $ihier_part . ncg('?' . $iquery, '?');
$IRI = $scheme . ':' . $ihier_part . ncg('?' . $iquery, '?') . ncg('#' . $ifragment, '?');
$IRI_reference = ncg($IRI . '|' . $irelative_ref);
编辑2011年3月7日:由于PHP在引用字符串中处理反斜杠的方式,因此默认情况下它们不可用。 除了反斜杠在正则表达式中有特殊含义的地方,你需要双重转义反斜杠。 你可以这样做:
$escape_backslash = '/(?<!)(?![[]^$.|*+()QEnrtaefvdwsDWSbAZzB1-9GX]|x{[0-9a-f]{1,4}}|c[A-Z]|)/';
$absolute_IRI = preg_replace($escape_backslash, '\', $absolute_IRI);
$IRI = preg_replace($escape_backslash, '\', $IRI);
$IRI_reference = preg_replace($escape_backslash, '\', $IRI_reference);
我刚写了一篇博客文章,提供了一个很好的解决方案,用于识别大多数使用的格式中的URL,例如:
www.google.com
http://www.google.com
mailto:somebody@google.com
somebody@google.com
www.url-with-querystring.com/?url=has-querystring
使用的正则表达式是:
/((([A-Za-z]{3,9}:(?://)?)(?:[-;:&=+$,w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=+$,w]+@)[A-Za-z0-9.-]+)((?:/[+~%/.w-_]*)???(?:[-+=&;%@.w_]*)#?(?:[w]*))?)/
但是,我建议你去http://blog.mattheworiordan.com/post/13174566389/url-regular-expression-for-links-with-or-without-the查看工作示例。
什么平台? 如果使用.NET,请使用System.Uri.TryCreate
,而不是正则表达式。
例如:
static bool IsValidUrl(string urlString)
{
Uri uri;
return Uri.TryCreate(urlString, UriKind.Absolute, out uri)
&& (uri.Scheme == Uri.UriSchemeHttp
|| uri.Scheme == Uri.UriSchemeHttps
|| uri.Scheme == Uri.UriSchemeFtp
|| uri.Scheme == Uri.UriSchemeMailto
/*...*/);
}
// In test fixture...
[Test]
void IsValidUrl_Test()
{
Assert.True(IsValidUrl("http://www.example.com"));
Assert.False(IsValidUrl("javascript:alert('xss')"));
Assert.False(IsValidUrl(""));
Assert.False(IsValidUrl(null));
}
(感谢@Yoshi关于javascript:
的提示:)
上一篇: What is the best regular expression to check if a string is a valid URL?