Understanding a linear runtime implementation of anagram detection

2018-05-31 09:20:52

I am learning about analysis of algorithms (python 2.7.6). I am reading a book (Problem solving with Algorithms and Data Structures) where Python is the language used for implementations. In Chapter 2, the author introduces algorithm analysis in a very clear and understandable way, and uses an anagram detection program as a template to compare different runtime implementations (quadratics, log linear, linear). In the linear, and most efficient implementation, the code is as follows (comments added by me):

def anagram_test2(s1,s2):
""" Checks if two strings are anagrams of each other
    Runs with O(n) linear complexity """

if (not s1) or (not s2):
    raise TypeError, "Invalid input: input must be string"
    return None

# Initialize two lists of counters         
c1 = [0] * 26
c2 = [0] * 26

# Iterate over each string
# When a char is encountered, 
# increment the counter at 
# its correspoding position   
for i in range(len(s1)):
    pos = ord(s1[i]) - ord("a")
    c1[pos] += 1

for i in range(len(s2)):
    pos = ord(s2[i]) - ord("a")
    c2[pos] += 1

j = 0
hit = True
while j < 26 and hit:
    if c1[j] == c2[j]:
        j += 1
    else:
        hit = False

return hit

My question is: Can the code block following the two for loops not be replaced by the simpler:

if c1 == c2:
    return True
else:
    return False

return

where no iteration is necessary (as opposed to using the while statement)? Is there some computational/programmatic reason for using the first method vs. the second? I ran this version on various string combinations and it works exactly the same.

And a more general question: The author kind of implies that nested iterations cause quadratic runtime whereas non-nested iterations cause linear/logarithmic/log linear runtime. Is there a distinct set of rules for determining an algorithm's runtime? For example, how does one distinguish between linear/log linear/logarithmic algorithms given a program without nested iterations? In the example immediately before the one I posted above, the author used a sort and compare implementation where there are no nested loops but admits that the sort method has its own cost, which is either log linear or quadratic.

Yes, all this code does is checking whether two arrays are equal. And to do this you can do just return c1 == c2

Is there a distinct set of rules for determining an algorithm's runtime Figuring out the runtime of the algorithm is a complex procedure, but there can be a couple of shortcuts. Some of the shortcuts are:

k nested loops that runs from a constant to n incrementing by a constant will run in O(n^k)

a loop that runs from a constant to n where you multiply by a constant will run in O(log(n))

if you have a recursion (which happens in a lot of divide and conquer algorithms), you can analyze the complexity using masters theorem. For really hard recursions there is a Bazzi method.

PS not related to your question, but the whole function can be substituted with just a counter.

from collections import Counter
def isAnagram(s1, s2):
    return Counter(s1) == Counter(s2)

In python you can do c1 == c2 , but a loop is required for other languages, and that is what the author is trying to show.

In python, that single line of code does an implicit for loop over each index to check for equality, anyways.

It looks like the author is trying to spell out the algorithm in terms of O(1) operations (which makes sense when trying to figure out the overall runtime complexity).

c1 == c2 (where c1 and c2 are lists) hides a fair bit of complexity; it actually works more like len(c1) == len(c2) and all(ch1 == ch2 for ch1,ch2 in zip(c1, c2)) . He has shown the basic operations involved in the comparison.

链接地址: http://www.djcxy.com/p/6816.html

上一篇: O（log n）的解释？

下一篇: 理解一个线性运行时实现的字谜检测