How to replace each Capture of a Group individually?
I have a regular expression which uses GroupCollection
s in it's capture to capture a group of Item Id's (which can be comma separated, also accounting for the final one to have the word 'and'):
(bItem #(?<ITEMID>d+))|(,s?(?<ITEMID>d+))|(,?sands(?<ITEMID>d+))
Is there an easy way using C#'s Regex
class to replace the ITEMID numbers with a url? Right now, I have the following:
foreach (Match match in matches)
{
var group = match.Groups["ITEMID"];
var address = String.Format(UnformattedAddress, group.Value);
CustomReplace(ref myString, group.Value, address,
group.Index, (group.Index + group.Length));
}
public static int CustomReplace(ref string source, string org, string replace,
int start, int max)
{
if (start < 0) throw new System.ArgumentOutOfRangeException("start");
if (max <= 0) return 0;
start = source.IndexOf(org, start);
if (start < 0) return 0;
var sb = new StringBuilder(source, 0, start, source.Length);
var found = 0;
while (max-- > 0)
{
var index = source.IndexOf(org, start);
if (index < 0) break;
sb.Append(source, start, index - start).Append(replace);
start = index + org.Length;
found++;
}
sb.Append(source, start, source.Length - start);
source = sb.ToString();
return found;
}
The CustomReplace
method I found online as an easy way to replace one string with another inside of a string source. The problem is I'm sure that there is probably an easier way, probably using the Regex
class to replace the GroupCollection
s as necessary. I just can't figure out what that is. Thanks!
Example text:
Hello the items you are looking for are Item #25, 38, and 45. They total 100 dollars.
25
, 38
, and 45
should be replaced with the URL strings I am creating (this is an HTML string).
Your pattern works for your input, but it does have a bug. Specifically, it will match any number in your input that appears after a comma or the word " and ".
I went ahead and rewrote your pattern to avoid this issue. To achieve this I am actually using two regex patterns. It's possible to pull this off using one pattern, but it's fairly complicated and less readable than the approach I opted to share.
The main pattern is: bItem #d+(?:,? d+)*(?:,? and d+)?
No capturing groups are used here since I am only interested in matching the items. The (?: ... )
bit is a non-capturing group. The usage of (?:,? d+)*
is to match more than one comma separated value in the middle portion of the string.
Once items are matched, I use Regex.Replace
to format the items, then reconstruct the string to swap out the original items with the formatted items.
Here's an example with a couple of different inputs:
string[] inputs =
{
"Hello the items you are looking for are Item #25, 38, 22, and 45. They total 100 dollars.",
"... Item #25, 38 and 45. Other numbers 100, 20, and 30 untouched.",
"Item #25, and 45",
"Item #25 and 45",
"Item #25"
};
string pattern = @"bItem #d+(?:,? d+)*(?:,? and d+)?";
string digitPattern = @"(d+)";
// $1 refers to the first (and only) group in digitPattern
string replacement = @"<a href=""http://url/$1.html"">$1</a>";
foreach (var input in inputs)
{
Match m = Regex.Match(input, pattern);
string formatted = Regex.Replace(m.Value, digitPattern, replacement);
var builder = new StringBuilder(input)
.Remove(m.Index, m.Length)
.Insert(m.Index, formatted);
Console.WriteLine(builder.ToString());
}
In case you need to use an existing method to format the URL, instead of using a regex replacement pattern, you could use the Regex.Replace
overload that accepts a MatchEvaluator
. This can be achieved using a lambda and is nicer than the tedious approach shown in the MSDN documentation.
For example, let's assume you have a FormatItem
method that accepts a string and returns a formatted string:
public string FormatItem(string item)
{
return String.Format("-- {0} --", item);
}
To use FormatItem
you would change the Regex.Replace
method used in the earlier code sample with the following:
string formatted = Regex.Replace(m.Value, digitPattern,
d => FormatItem(d.Value));
here is an example of the syntax needed and also shows that you can drop back into C# in the replacement via a callback.
How does MatchEvaluator in Regex.Replace work?
You seem to be coming at this from two directions at once. On the one hand you've got a regex with three capturing groups in it, so you expect the solution to involve a GroupCollection. On the other hand all three groups have the same name, so maybe you have to treat them as individual captures of the same group--ie a CaptureCollection. In reality, you probably don't need either of them. Here's your regex (after a little aesthetic tweaking):
string source = @"Total cost for Item #25, 38, and 45 is 100 dollars.";
Regex regex1 = new Regex(
@"bItem #(?<ITEMID>d+)|,s*(?<ITEMID>d+)|,?s+ands+(?<ITEMID>d+)",
RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);
foreach (Match m in regex1.Matches(source)) {
Console.WriteLine(m.Groups["ITEMID"].Value);
}
It outputs 25
, 38
, 45
as expected. Each alternative has its own copy of the capturing group, but only one of them will participate in each match. This is a notable feature of the .NET regex flavor; some of the others provide special settings or group constructs that permit you reuse group names, but none of them make it as easy as .NET does. However, you don't really need it in this case; you can just merge the alternatives, like this:
@"(bItem #|,s*|,?s+ands+)(?<ITEMID>d+)"
There is a problem with your regex, though, which is revealed if you change the source string to this:
@"Total cost for Item #25, 38, and 45 is 1,500 dollars and 42 cents."
The output is now 25
, 38
, 45
, 500
, 42
. To prevent those false positives, you need to make sure each match that doesn't start with Item #
, starts where the last match ended. For that you can use G
:
@"(bItem #|G,?s+ands+|G,s*)(?<ITEMID>d+)"
(I also swapped the order of the last two alternatives for efficiency's sake.) Putting all that together, we have just another regex substitution.
string source =
@"Total cost for Item #25, 38, and 45 is 1,500 dollars and 42 cents.";
Regex regex2 = new Regex(
@"(?<TEXT>bItem #|G,?s+ands+|G,s*)(?<ITEMID>d+)",
RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);
string result = regex2.Replace(source,
@"${TEXT}<a href='URL_${ITEMID}'>${ITEMID}</a>");
Console.WriteLine(result);
No explicit use of GroupCollections or CaptureCollections needed, and unless your replacement is much more complicated than this, probably no need for MatchEvaluator either.
链接地址: http://www.djcxy.com/p/74812.html上一篇: 在Perl中重复编号的捕获组
下一篇: 如何分别替换组的每个捕获?