Hashing same character multiple times
I'm doing a programming challenge and I'm going crazy with one of the challenges. In the challenge, I need to compute the MD5 of a string. The string is given in the following form:
n[c]
: Where n
is a number and c
is a character. For example: b3[a2[c]]
=> baccaccacc
Everything went ok until I was given the following string:
1[2[3[4[5[6[7[8[9[10[11[12[13[a]]]]]]]]]]]]]
This strings turns into a string with 6227020800 a
's. This string is more than 6GB, so it's nearly impossible to compute it in practical time. So, here is my question:
Are there any properties of MD5 that I can use here?
I know that there has to be a form to make it in short time, and I suspect it has to be related to the fact that all the string has is the same character repeated multiple times.
You probably have created a (recursive) function to produce the result as a single value. Instead you should use a generator to produce the result as a stream of bytes. These you can then feed byte by byte into your MD5 hash routine. The size of the stream does not matter this way, it will just have an impact on the computation time, not on the memory used.
Here's an example using a single-pass parser:
import re, sys, md5
def p(s, pos, callBack):
while pos < len(s):
m = re.match(r'(d+)[', s[pos:])
if m: # repetition?
number = m.group(1)
for i in range(int(number)):
endPos = p(s, pos+len(number)+1, callBack)
pos = endPos
elif s[pos] == ']':
return pos + 1
else:
callBack(s[pos])
pos += 1
return pos + 1
digest = md5.new()
def feed(s):
digest.update(s)
sys.stdout.write(s)
sys.stdout.flush()
end = p(sys.argv[1], 0, feed)
print
print "MD5:", digest.hexdigest()
print "finished parsing input at pos", end
All hash functions are designed to work with byte streams, so you should not first generate the whole string, and after that hash it - you should write generator, which produces chunks of string data, and feed it to MD5 context. And, MD5 uses 64-byte (or char) buffer so it would be a good idea to feed 64-byte chunks of data to the context.
Take advantage of the good properties of hashes:
import hashlib
cruncher = hashlib.md5()
chunk = 'a' * 100
for i in xrange(100000):
cruncher.update(chunk)
print cruncher.hexdigest()
Tweak the number of rounds (x = 10000) and the length of the chunk (y = 100) so that x * y = 13!. The point is that your are feeding the algorithm with chunks of your string (each one x characters long), one after the other, for y times.
链接地址: http://www.djcxy.com/p/13148.html上一篇: UITableView使用UIRefreshControl拖动距离
下一篇: 散列相同的字符多次