Case sensitive order using Java Collator

I am trying to understand how case sensitive order should really work using Java Collator.

In this example following strings are sorted in French locale using all strengths (I have added a few extra strings to the data set for illustrative purposes):

[Äbc, äbc, Àbc, àbc, Abc, abc, ABC] - Original Data
[Äbc, äbc, Àbc, àbc, Abc, abc, ABC] Primary
[Abc, abc, ABC, Àbc, àbc, Äbc, äbc] Secondary
[abc, Abc, ABC, àbc, Àbc, äbc, Äbc] Tertiary

Case kicks in only with Tertiary Collation Strength  : 
[CACHE, cache, Cache, da, DA, Da] - Original Data
[CACHE, cache, Cache, da, DA, Da] Primary
[CACHE, cache, Cache, da, DA, Da] Secondary
[cache, Cache, CACHE, da, Da, DA] Tertiary

But the result I was really expecting was this:

[abc, àbc, äbc, Abc, ABC, Àbc, Äbc] Tertiary
[cache, da, Cache, CACHE, Da, DA] Tertiary

In other words, I would like all lowercase go first (sorted alphabetically), followed by uppercase (or vice versa). Is this not a reasonable expectation?


Interestingly, the android javadoc is somewhat more helpful than the oracle one - in particular:

A tertiary difference is ignored when there is a primary or secondary difference anywhere in the strings.

Also worth noting: the order you get is what you would expect in French locale. According to the wikipedia article on "ordre alphabétique":

En première analyse, les caractères accentués, de même que les majuscules, ont le même rang alphabétique que le caractère fondamental.
Si plusieurs mots ont le même rang alphabétique, on tâche de les distinguer entre eux grâce aux majuscules et aux accents (pour le e, on a l'ordre e, é, è, ê, ë)

In English (my addition in italic):

The first step consists in ranking letters, regardless of their accentuation or case (ie: a,A,à rank the same). If several words have the same rank following the first step, case and accentuation are taken into account.

In other words, c (small cap) and D (large cap) will always be sortable with a Primary strength and the Tertiary strength won't change that order.

So in your example, you will always have cache before da , regardless of case and accents. Case will only make a difference if the primary letter is the same ( c (small) vs. C (large) for example).


The sample code is working as intended. You can use custom collation rules to get the desired output.

RuleBasedCollator is the only subclass of Collator in JDK. Your call to Collator.getInstance(Locale.FRANCE) returns an instance of RuleBasedCollator

You could create your own instance using

RuleBasedCollator myCollator = new RuleBasedCollator(rules);

The format for rules is given in the javadoc.

Hope it helps.


另一种选择:如果您需要自定义语言环境的规则,则可以尝试使用RuleBasedCollat​​or:

    RuleBasedCollator collTemp = (RuleBasedCollator) Collator.getInstance(Locale.US);

    String usRules = collTemp.getRules();

    //Remove dashes rule from US locale (dashes come after letters)
    usRules = usRules.replace(",'-'", "");

    //Create a collator with customized rules    
    RuleBasedCollator coll = new RuleBasedCollator(usRules);

    //Sort the collection based on collator
    Collections.sort(lines, coll);
链接地址: http://www.djcxy.com/p/74306.html

上一篇: 第一次Facebook登录后,应用程序崩溃

下一篇: 使用Java Collat​​or的区分大小写的顺序