Is there a better way to create permutations of string w bounded by length?
I need to create permutations of all the strings in a list. My list can grow to be 300+ items long, so I began to look for ways to optimize the permutations
function from itertools
.
Unfortunately, the length of the list is a must-have. Shrinking it is possible, but for my purpose, the longer the list, the better.
I understand just how many permutations are possible and that the number grows exponentially depending on the permutation length - this is also essential to my purpose.
So far, I've been able to jury-rig a way to only output permutations that are within certain length boundaries:
def bounded_permutations(iterable, min_length, max_length, outfilename, r=None):
outfile = open(outfilename,'w+')
outfile.truncate()
pool = tuple(iterable)
n = len(pool)
r = n if r is None else r
if r > n: return
indices = range(n)
cycles = range(n, n-r, -1)
if min_length <= len(str(''.join((tuple(pool[i] for i in indices[:r]))))) <= max_length:
outfile.writelines((str(''.join(tuple(pool[i] for i in indices[:r])))) + 'n')
while n:
for i in reversed(range(r)):
cycles[i] -= 1
if cycles[i] == 0:
indices[i:] = indices[i+1:] + indices[i:i+1]
cycles[i] = n - i
else:
j = cycles[i]
indices[i], indices[-j] = indices[-j], indices[i]
if min_length <= len(str(''.join((tuple(pool[i] for i in indices[:r]))))) <= max_length:
outfile.writelines((str(''.join(tuple(pool[i] for i in indices[:r])))) + 'n')
break
else: return
This is simply the code from the itertools library, with a length check and outfile.write inserted. I used to output to a list, because I wanted to sort the lines by length, but for optimization's sake, I just wrote them to a file so they wouldn't be stored.
I see two places where this is inefficient, but I haven't been able to improve it:
It seems to me that the permutation function 'bothers' to generate all the lines, but does not return lines that don't meet the boundaries. I've been attempting to come up with a method that doesn't even 'bother' to generate the lines of improper length.
The condition check contains the actual generation of the tuple, so the tuple needs to be generated twice if it is valid. In addition, the length check requires execution of the len
, str
, join
, and tuple
functions as well as a complete iteration of the items in the pool. This can't be the fastest way, but I haven't figured out how to determine the length of the completed line before joining and converting.
Any ideas?
链接地址: http://www.djcxy.com/p/24324.html上一篇: Haskell中的排列实现