How to use regular expression in iPhone app to separate string by , (comma)

I have to read .csv file which has three columns. While parsing the .csv file, I get the string in this format Christopher Bass,"Cry the Beloved Country Final Essay",cbass@cgs.k12.va.us . I want to store the values of three columns in an Array, so I used componentSeparatedByString:@"," method! It is successfully returning me the array with three components:

  • Christopher Bass
  • Cry the Beloved Country Final Essay
  • cbass@cgs.k12.va.us
  • but when there is already a comma in the column value, like this Christopher Bass,"Cry, the Beloved Country Final Essay",cbass@cgs.k12.va.us it separates the string in four components because there is a ,(comma) after the Cry:

  • Christopher Bass
  • Cry
  • the Beloved Country Final Essay
  • cbass@cgs.k12.va.us
  • so, How can I handle this by using regular expression. I have "RegexKitLite" classes but which regular expression should I use. Please help!

    Thanks-


    Any regular expression would probably turn out with the same problem, what you need is to sanitize your entries or strings, either by escaping your commas or by highlighting strings this way: "My string" . Otherwise you will have the same problem. Good luck.

    For your example you would probably need to do something like:

    "Christopher Bass","Cry, the Beloved Country Final Essay","cbass@cgs.k12.va.us"
    

    That way you could use a regexp or even the same method from the NSString class.

    Not related at all, but the importance of sanitizing strings: http://xkcd.com/327/ hehehe.


    How about this:

    componentsSeparatedByRegex:@","|","
    

    This should split your string whereever " and , appear together in either order, resulting in a three-member array. This of course assumes that the second element in the string is always enclosed in parentheses, and the characters " and , never appear consecutively within the three components.

    If either of these assumptions is incorrect, other methods to identify string components may be used, but it should be made clear that no generic solution exists. If the three component strings can contain " and , anywhere, not even a limited solution is possible in such cases:

    Doe, John,""Why Unescaped Strings Suck", And Other Development Horror Stories",Doe, John <john.doe@dev.null>
    

    Hopefully there is nothing like the above in your CSV data. If there is, the data is basically unusable, and you should look into a better CSV exporter.


    The regex you're searching for is: "(.*)"[ ^,]*|([^,]*),

    in ObjC: (('"' && string_1 && '"' && 0-n spaces) || string_2 except comma) && comma

    NSString *str = @"Christopher Bass,"Cry, the Beloved Country ,Final Essay",cbass@cgs.k12.va.us,som";
    NSString *regEx = @""(.*)"[ ^,]*|([^,]*),";
    NSMutableArray *split = [[str componentsSeparatedByRegex:regEx] mutableCopy];
    [split removeObject:@""]; // because it will print always both groups even if the other is empty
    NSLog(@"%@", split);
    
    // OUTPUT:
    2012-02-07 17:42:18.778 tmpapp[92170:c03] (
        "Christopher Bass",
        "Cry, the Beloved Country ,Final Essay",
        "cbass@cgs.k12.va.us",
        som
    )
    

    RegexKitLite will add both strings to the array, therefore you will end up with empty objects for your array. removeObject:@"" will delete those but if you need to maintain true empty values (eg. your source has val,,ue ) you have to modify the code to the following:

    str = [str stringByReplacingOccurrencesOfRegex:regEx withString:@"$1$2∏"];
    NSArray *split = [str componentsSeparatedByString:@"∏"];
    

    $1 and $2 are those two strings mentioned above, ∏ is in this case a character which will most likely never appear in normal text (and is easy to remember: option-shift-p).

    链接地址: http://www.djcxy.com/p/10398.html

    上一篇: Android清单VM最小的堆大小

    下一篇: 如何在iPhone应用程序中使用正则表达式来分隔字符串,(逗号)