Regex to remove special/invisible characters
the problem is to remove some strange, characters from domain name, but keep special unicode characters such as accented letters (german, danish of polish language) For example: radisson-blu.es, you cant see, but there's additional char between ss. (Try to copy to notepad to see it).
I've seen many posts about similar problems, but each solution doesn't remove that special character, or it's removing it, but also other special characters i need to keep.
用空字符串替换正则表达式[^ws.,!@#$%^&*()=+~`-]
The character you're (not) seeing there is U+00AD Soft Hyphen. You can reference it in a regular expression using u00ad
, eg:
Regex.Replace(str, @"u00ad", "");
But for a single-character replacement you could also use string.Replace
as well.
'xAD'
is a soft hyphen (the codepoint's name is "SOFT HYPHEN"
).
According to the Unicode codepoint database, its category is "Cf"
(or "Format"
), so it can be matched with the regex @"p{Cf}"
.
Strangely, Microsoft Visual C# 2010 Express says that it doesn't match @"p{Cf}"
, but instead matches @"p{Pd}"
( "Dash Punctuation"
), the same category as the normal hyphen.
上一篇: 我可以在URL中使用不可见字符吗?
下一篇: 正则表达式去除特殊/不可见字符