Split by delimiter without remove it from string
I'm want to use a Regex to split long string for seperated lines. Line can include any possible unicode character. Line is "ending" on dot ("." - one or more) or on new line ("n").
Example:
This string will be the input:
"line1. line2.. line3... line4.... line5..... line6
n
line7"
The output:
If I understand what you're asking for, you might try a pattern like this:
(?<=.)(?!.)|n
This will split the string on any position which is preceded by a .
but not followed by a .
or a n
character.
Note that this pattern preserves any whitespace after the dots, for example:
var input = @"line1. line2.. line3... line4.... line5..... line6nline7";
var output = Regex.Split(input, @"(?<=.)(?!.)|n");
Produces
line1.
line2..
line3...
line4....
line5.....
line6
line7
If you'd like to get rid of the whitespace simply change this to:
(?<=.)(?!.)s*|n
But if you know that the dots will always be followed by whitespace, you can simplify this to:
(?<=.)s+|n
Try this:
String result = Regex.Replace(subject, @"""?(w+([.]+)?)(?:[n ]|[""n]$)+", @"""$1""n");
/*
"line1."
"line2.."
"line3..."
"line4...."
"line5....."
"line6"
"line7"
*/
Regex Explanation
"?(w+([.]+)?)(?:[n ]|["n]$)+
Match the character “"” literally «"?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the regular expression below and capture its match into backreference number 1 «(w+([.]+)?)»
Match a single character that is a “word character” (letters, digits, and underscores) «w+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regular expression below and capture its match into backreference number 2 «([.]+)?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the character “.” «[.]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regular expression below «(?:[n ]|["n]$)+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match either the regular expression below (attempting the next alternative only if this one fails) «[n ]»
Match a single character present in the list below «[n ]»
A line feed character «n»
The character “ ” « »
Or match regular expression number 2 below (the entire group fails if this one fails to match) «["n]$»
Match a single character present in the list below «["n]»
The character “"” «"»
A line feed character «n»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
If you want to keep all dots intact and dots will be followed by a empty space, then this could be your regex:
String result = Regex.Replace(t, @".s", @".n");
This will be one string. You haven't stated if you want more strings or one as result.
链接地址: http://www.djcxy.com/p/13468.html上一篇: 复杂的正则表达式,可以过滤单词
下一篇: 按分隔符分割而不将其从字符串中移除