获取URL的部分（正则表达式）

2018-06-27 11:07:45

鉴于URL（单行）：
http://test.example.com/dir/subdir/file.html

我如何使用正则表达式提取以下部分：

子域（测试）

域（example.com）

没有文件的路径（/ dir / subdir /）

该文件（file.html）

文件路径（/dir/subdir/file.html）

没有路径的网址（http://test.example.com）

（添加其他你认为会有用的东西）

即使输入以下URL，正则表达式也应该正常工作：
http://example.example.com/example/example/example.html

谢谢。

一个正则表达式来解析和分解完整的URL，包括查询参数和锚点，例如

https://www.google.com/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash

^((http[s]?|ftp):/)?/?([^:/s]+)((/w+)*/)([w-.]+[^#?s]+)(.*)?(#[w-]+)?$

RexEx职位：

网址：RegExp ['$＆']，

协议：正则表达式$ 2，

主持人：正则表达式$ 3

路径：正则表达式$ 4

文件：正则表达式$ 6

查询：正则表达式$ 7，

哈希：正则表达式$ 8

那么你可以很容易地进一步解析主机（'。'分隔）。

我会做的是使用这样的东西：

/*
    ^(.*:)//([A-Za-z0-9-.]+)(:[0-9]+)?(.*)$
*/
proto $1
host $2
port $3
the-rest $4

进一步解析'其余'尽可能具体。在一个正则表达式中这样做有点疯狂。

我意识到我迟到了，但是有一个简单的方法可以让浏览器为你解析一个没有正则表达式的url：

var a = document.createElement('a');
a.href = 'http://www.example.com:123/foo/bar.html?fox=trot#foo';

['href','protocol','host','hostname','port','pathname','search','hash'].forEach(function(k) {
    console.log(k+':', a[k]);
});

/*//Output:
href: http://www.example.com:123/foo/bar.html?fox=trot#foo
protocol: http:
host: www.example.com:123
hostname: www.example.com
port: 123
pathname: /foo/bar.html
search: ?fox=trot
hash: #foo
*/

我晚了几年，但我很惊讶没有人提到统一资源标识符规范中有一节用正则表达式解析URI。 Berners-Lee等撰写的正则表达式是：

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))?
 12            3  4          5       6  7        8 9

上面第二行的数字只是为了提高可读性; 他们指出每个子表达的参考点（即每个成对的括号）。我们将与subexpression匹配的值称为$。例如，将上述表达式匹配到

http://www.ics.uci.edu/pub/ietf/uri/#Related

导致以下子表达式匹配：

$1 = http:
$2 = http
$3 = //www.ics.uci.edu
$4 = www.ics.uci.edu
$5 = /pub/ietf/uri/
$6 = <undefined>
$7 = <undefined>
$8 = #Related
$9 = Related

对于它的价值，我发现我必须避免JavaScript中的正斜杠：

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))?

链接地址: http://www.djcxy.com/p/76775.html

上一篇: Getting parts of a URL (Regex)

下一篇: Make draggable element sortable within droppable using jQuery UI