Java CSV parser with string separator (multi

Is there any Java open source library that supports multi-character (ie, String with length > 1) separators (delimiters) for CSV?

By definition, CSV = Comma-Separated Values data with a single character (',') as the delimiter. However, many other single-character alternatives exist (eg, tab), making CSV to stand for "Character-Separated Values" data (essentially, DSV: Delimiter-Separated Values data).

Main Java open source libraries for CSV (eg, OpenCSV) support virtually any character as the delimiter, but not string (multi-character) delimiters. So, for data separated with strings like "|||" there is no other option than preprocessing the input in order to transform the string to a single-character delimiter. From then on, the data can be parsed as single-character separated values.

It would therefore be nice if there was a library that supported string separators natively, so that no preprocessing was necessary. This would mean that CSV now standed for "CharSequence-Separated Values" data. :-)


This is a good question. The problem was not obvious to me until I looked at the javadocs and realised that opencsv only supports a character as a separator, not a string....

Here's a couple of suggested work-arounds (Examples in Groovy can be converted to java).

Ignore implicit intermediary fields

Continue to Use OpenCSV, but ignore the empty fields. Obviously this is a cheat, but it will work fine for parsing well-behaved data.

    CSVParser csv = new CSVParser((char)'|')

    String[] result = csv.parseLine('J||Project report||"F, G, I"||1')

    assert result[0] == "J"
    assert result[2] == "Project report"
    assert result[4] == "F, G, I"
    assert result[6] == "1"

or

    CSVParser csv = new CSVParser((char)'|')

    String[] result = csv.parseLine('J|||Project report|||"F, G, I"|||1')

    assert result[0] == "J"
    assert result[3] == "Project report"
    assert result[6] == "F, G, I"
    assert result[9] == "1"

Roll your own

Use the Java String tokenizer method.

    def result = 'J|||Project report|||"F, G, I"|||1'.tokenize('|||')

    assert result[0] == "J"
    assert result[1] == "Project report"
    assert result[2] == ""F, G, I""
    assert result[3] == "1"

Disadvantage of this approach is that you lose the ability to ignore quote characters or escape separators..

Update

Instead of pre-processing the data, altering it's content, why not combine both of the above approaches in a two step process:

  • Use the "roll your own" to first validate the data. Split each line and prove that it contains the requiste number of fields.
  • Use the "field ignoring" approach to parse the validated data, secure in the knowledge that the correct number of fields have been specified.
  • Not very efficient, but possibly easier that writing your own CSV parser :-)


    Try opencsv.

    It does everything you need, including (and especially) handling embedded delimiters within quoted values (eg "a,b", "c" parses as ["a,b", "c"] )

    I've used it successfully and I liked it.

    Edited:

    Since opencsv handles only single-character separators, you could work around this thus:

    String input;
    char someCharNotInInput = '|';
    String delimiter = "abc"; // or whatever
    input.replaceAll(delimiter, someCharNotInInput);
    new CSVReader(input, someCharNotInInput); // etc
    // Put it back into each value read
    value.replaceAll(someCharNotInInput, delimiter); // in case it's inside delimiters
    
    链接地址: http://www.djcxy.com/p/56932.html

    上一篇: 有没有办法来加载多态关联的关联?

    下一篇: 具有字符串分隔符的Java CSV分析器(多