Building Regular Expression (RegEx) to extract text of HTML tag
This question already has an answer here:
<a href="javascript:ProcessQuery('report_drilldown',[0-9]+)">([^<]*)</a>
This won't really solve the problem, but it may just barely scrape by. In particular, it's very brittle, the slightest change to the markup and it won't match. If report_drilldown
isn't meant to be absolute, replace it with [^']*
, and/or capture both it and the number if you need.
If you need something that parses HTML, then it's a bit of a nightmare if you have to deal with tag soup. If you were using Python, I'd suggest BeautifulSoup, but I don't know something similar for C#. (Anyone know of a similar tag soup parsing library for C#?)
The answer is... DON'T!
Use a library, such as this one
I agree regex might not be the best way to parse this, but using backreference it's easily done:
<(?<tag>w*)(?:.*)>(?<text>.*)</k<tag>>
Where tag and text are named capture groups.
hat-tip: expresso library
链接地址: http://www.djcxy.com/p/76862.html上一篇: 正则表达式匹配打开和关闭的html标签