Why is String.equalsIgnoreCase is so slow

2018-06-13 21:18:08

I encountered a question in interview to write a method to check for similar words irrespective of character cases.

I answered it by using the difference of ASCII value for each pair of characters. But at home, when I went through the actual implementation of it in String.class, I get disturbed - Why is it implemented that way!

I tried to draw a comparison between inbuilt and my custom method, this way-

public class EqualsIgnoreCase {

    public static void main(String[] args) {
        String str1 = "Srimant @$ Sahu 959s";
        String str2 = "sriMaNt @$ sAhu 959s";

        System.out.println("Avg millisecs with inbuilt () - " + averageOfTenForInbuilt(str1, str2));
        System.out.println("nAvg millisecs with custom () - " + averageOfTenForCustom(str1, str2));
    }

    public static int averageOfTenForInbuilt(String str1, String str2) {
        int avg = 0;
        for (int itr = 0; itr < 10; itr++) {
            long start1 = System.currentTimeMillis();
            for (int i = 0; i < 100000; i++) {
                str1.equalsIgnoreCase(str2);
            }
            avg += System.currentTimeMillis() - start1;
        }
        return avg / 10;
    }

    public static int averageOfTenForCustom(String str1, String str2) {
        int avg = 0;
        for (int itr = 0; itr < 10; itr++) {
            long start2 = System.currentTimeMillis();
            for (int i = 0; i < 100000; i++) {
                isEqualsIgnoreCase(str1, str2);
            }
            avg += System.currentTimeMillis() - start2;
        }
        return avg / 10;
    }

    public static boolean isEqualsIgnoreCase(String str1, String str2) {
        int length = str1.length();
        if (str2.length() != length) {
            return false;
        }

        for (int i = 0; i < length; i++) {
            char ch1 = str1.charAt(i);
            char ch2 = str2.charAt(i);

            int val = Math.abs(ch1 - ch2);
            if (val != 0) {
                if (isInAlphabetsRange(ch1, ch2)) {
                    if (val != 32) {
                        return false;
                    }
                } else {
                    return false;
                }
            }
        }
        return true;
    }

    public static boolean isInAlphabetsRange(char ch1, char ch2) {
        return (((ch1 <= 122 && ch1 >= 97) || (ch1 <= 90 && ch1 >= 65)) && ((ch2 <= 122 && ch2 >= 97) || (ch2 <= 90 && ch2 >= 65)));
    }

}

Output-

Avg millisecs with inbuilt () - 14

Avg millisecs with custom () - 5

I found that the inbuilt method is hitting efficiency, as because of lots of checks and method calls. Is there any specific reasons behind such an implementation? Or Am I missing something in my logic?

Any suggestions, will be heartily appreciated!

Your routine only handles ASCII characters. The system one handles all unicode characters.

Consider following example:

public class Test {

    public static void main(String[] args) {
        System.out.println((int) 'ě'); // => 283
        System.out.println((int) 'Ě'); // => 282 
    }

}

Your method is incorrect in many ways. For instance, it considers "!" equal to "B", "B" equal to "1", but "!" not equal to "1" (so it isn't transitive as we would expect an equals method to be).

Yes, it is quite easy to write an incorrect implementation for that method that is both faster and simpler. A fair challenge would be to write a correct one, ie that correctly handles all arguments the JDK implementation does.

You might also wish to look at How do I write a correct micro-benchmark in Java? to get more reliable performance measurements.

This might not be the only reason, but the fact that your solution doesn't actually work for all possible strings is definitely a factor.

There are some (annoying) locales for which two characters might have the same upper case but not the same lower case. For this reason, in order to work (most of the time, see Turkish), the canonical implementation must compare the strings char-for-char in both their lower and upper cases.

Your implementation is probably perfect 99% of the time, especially if you only have to deal with the english locale, but the core library implementation can unfortunately not make such assumptions.

链接地址: http://www.djcxy.com/p/39568.html

上一篇: 使用immutable.js覆盖Object.assign或spread运算符的优点

下一篇: 为什么String.equalsIgnoreCase太慢了