Java CSV parser with string separator (multi
Is there any Java open source library that supports multi-character (ie, String with length > 1) separators (delimiters) for CSV?
By definition, CSV = Comma-Separated Values data with a single character (',') as the delimiter. However, many other single-character alternatives exist (eg, tab), making CSV to stand for "Character-Separated Values" data (essentially, DSV: Delimiter-Separated Values data).
Main Java open source libraries for CSV (eg, OpenCSV) support virtually any character as the delimiter, but not string (multi-character) delimiters. So, for data separated with strings like "|||" there is no other option than preprocessing the input in order to transform the string to a single-character delimiter. From then on, the data can be parsed as single-character separated values.
It would therefore be nice if there was a library that supported string separators natively, so that no preprocessing was necessary. This would mean that CSV now standed for "CharSequence-Separated Values" data. :-)
This is a good question. The problem was not obvious to me until I looked at the javadocs and realised that opencsv only supports a character as a separator, not a string....
Here's a couple of suggested work-arounds (Examples in Groovy can be converted to java).
Ignore implicit intermediary fields
Continue to Use OpenCSV, but ignore the empty fields. Obviously this is a cheat, but it will work fine for parsing well-behaved data.
CSVParser csv = new CSVParser((char)'|')
String[] result = csv.parseLine('J||Project report||"F, G, I"||1')
assert result[0] == "J"
assert result[2] == "Project report"
assert result[4] == "F, G, I"
assert result[6] == "1"
or
CSVParser csv = new CSVParser((char)'|')
String[] result = csv.parseLine('J|||Project report|||"F, G, I"|||1')
assert result[0] == "J"
assert result[3] == "Project report"
assert result[6] == "F, G, I"
assert result[9] == "1"
Roll your own
Use the Java String tokenizer method.
def result = 'J|||Project report|||"F, G, I"|||1'.tokenize('|||')
assert result[0] == "J"
assert result[1] == "Project report"
assert result[2] == ""F, G, I""
assert result[3] == "1"
Disadvantage of this approach is that you lose the ability to ignore quote characters or escape separators..
Update
Instead of pre-processing the data, altering it's content, why not combine both of the above approaches in a two step process:
Not very efficient, but possibly easier that writing your own CSV parser :-)
Try opencsv.
It does everything you need, including (and especially) handling embedded delimiters within quoted values (eg "a,b", "c"
parses as ["a,b", "c"]
)
I've used it successfully and I liked it.
Edited:
Since opencsv handles only single-character separators, you could work around this thus:
String input;
char someCharNotInInput = '|';
String delimiter = "abc"; // or whatever
input.replaceAll(delimiter, someCharNotInInput);
new CSVReader(input, someCharNotInInput); // etc
// Put it back into each value read
value.replaceAll(someCharNotInInput, delimiter); // in case it's inside delimiters
链接地址: http://www.djcxy.com/p/56932.html
上一篇: 有没有办法来加载多态关联的关联?