Some changes on Soundex Algorithm

This algorithm is set to run over the first word or till it fills the four encoded strings. For instance, the result of the input "Horrible Great" is: H612. It neglects the second word, or in other words it takes only the first letter from the second word to fill the encoded string.

I would like to change it by taking the first word and find its encoded string and THEN take the second word and find its encoded string; the output should be "H614 G600". Kindly i would like to know if there's a way to do that by doing some changing to **this code.
Thank you so much :)

    private string Soundex(string data)
    {
        StringBuilder result = new StringBuilder();
        if (data != null && data.Length > 0)
        {
            string previousCode = "", currentCode = "", currentLetter = "";
            result.Append(data.Substring(0, 1));
            for (int i = 1; i < data.Length; i++)
            {
                currentLetter = data.Substring(i,1).ToLower();
                currentCode = "";

                if ("bfpv".IndexOf(currentLetter) > -1)
                    currentCode = "1";
                else if ("cgjkqsxz".IndexOf(currentLetter) > -1)
                    currentCode = "2";
                else if ("dt".IndexOf(currentLetter) > -1)
                    currentCode = "3";
                else if (currentLetter == "l")
                    currentCode = "4";
                else if ("mn".IndexOf(currentLetter) > -1)
                    currentCode = "5";
                else if (currentLetter == "r")
                    currentCode = "6";

                if (currentCode != previousCode)
                    result.Append(currentCode);

                if (result.Length == 4) break;

                if (currentCode != "")
                    previousCode = currentCode;
            }
        }

        if (result.Length < 4)
            result.Append(new String('0', 4 - result.Length));

        return result.ToString().ToUpper();
    }

Sure, here is the solution I came up with. I wrapped the existing algorithm with another method that splits the strings and calls the original method. To use this, you would call SoundexByWord("Horrible Great") instead of calling Soundex("Horrible Great") and get the output of "H614 G630".

private string SoundexByWord(string data)
{
    var soundexes = new List<string>();
    foreach(var str in data.Split(' ')){
        soundexes.Add(Soundex(str));
    }
#if Net35OrLower
    // string.Join in .Net 3.5 and before require the second parameter to be an array.
    return string.Join(" ", soundexes.ToArray());
#endif
    // string.Join in .Net 4 has an overload that takes IEnumerable<string>
    return string.Join(" ", soundexes);
}

yes - first parse the string into an array of words (after you pick a separator)

then do this on each word

then assemble the results in some acceptable way and return.


The implementation in the question is correct but creates excess garbage with string operations. Here's a Char-array based implementation that's faster and creates very little garbage. It's designed as an extension method, and it handles phrases (words separated by spaces) as well:

    public static String Soundex( this String input )
    {
        var words = input.Split( ' ' );
        var result = new String[ words.Length ];
        for( var i = 0; i < words.Length; i++ )
            result[ i ] = words[ i ].SoundexWord();

        return String.Join( ",", result );
    }

    private static String SoundexWord( this String input )
    {
        var result = new Char[ 4 ] { '0', '0', '0', '0' };
        var inputArray = input.ToUpper().ToCharArray();

        if( inputArray.Length > 0 )
        {
            var previousCode = ' ';
            var resultIndex = 0;

            result[ resultIndex ] = inputArray[ 0 ];

            for( var i = 1; i < inputArray.Length; i++ )
            {
                var currentLetter = inputArray[ i ];
                var currentCode = ' ';

                if( "BFPV".IndexOf( currentLetter ) > -1 )
                    currentCode = '1';
                else if( "CGJKQSXZ".IndexOf( currentLetter ) > -1 )
                    currentCode = '2';
                else if( "DT".IndexOf( currentLetter ) > -1 )
                    currentCode = '3';
                else if( currentLetter == 'L' )
                    currentCode = '4';
                else if( "MN".IndexOf( currentLetter ) > -1 )
                    currentCode = '5';
                else if( currentLetter == 'R' )
                    currentCode = '6';

                if( currentCode != ' ' && currentCode != previousCode )
                    result[ ++resultIndex ] = currentCode;

                if( resultIndex == 3 ) break;

                if( currentCode != ' ' )
                    previousCode = currentCode;
            }
        }

        return new String( result );
    }
链接地址: http://www.djcxy.com/p/55232.html

上一篇: Prolog findall实现

下一篇: Soundex算法的一些变化