Replace character UNLESS surrounded by specific tag
First, yes, I know that regex should never be used to parse HTML, however, in this situation I'm taking a long string of text (output of var_dump(), actually) and using several regexes to transform it into XHTML so I know exactly what tags I will be dealing with. The last two regexes in my sequence look for the curly braces and transform into pieces of XHTML. It works great EXCEPT for when the curly braces are contained in a string variable, which I am outputting in between <var></var>
tags in a previous regex.
So, currently, I'm using: /s*{s*/u
. What I need to do is adjust this to ignore any curly brace anywhere within the <var></var>
tags.
I've tried using: /s*{s*(?!(?<!<var>)[^{]*</var>)/u
but that isn't quite right. I have not yet pinpointed what the conditions are that make it not work correctly. So, I may be close with this regex or I may be way off. Hence the need for the SO expertise. Thank you.
Also, if this is simply not possible, there are other hacks I can do, ie, base64_encode() the string, stick it in the <var></var>
tags and then as a last regex, base64_decode() anything surrounded by <var></var>
tags. I'd prefer to find a usable regex and more importantly, simply curious if it's possible.
This might work:
s*{s*(?:(?!(?:.*?</var>))|(?=[^<]+<var>))
Pretty much, I rephrased the question: Instead of not matching curly braces within <var>
, I only match curly braces that can be proved to be outside of <var>
. So, a curly brace is outside of a <var>
if:
(?!(?:.*?</var>))
, which uses a negative lookahead to ensure that we don't hit the closing </var>
tag, or (?=[^<]+<var>)
, which uses a positive lookahead to ensure that somewhere we'll eventually hit the opening <var>
tag. It will definitely fail with nested <var>
tags, but it seems to work with the test case I used. You can run it on RegExr and tell me what you think.
上一篇: 花括号首先加载语句吗?
下一篇: 替换被特定标记包围的字符UNLESS