Generate large number of unique random float32 numbers
I need to generate a binary file containing only unique random numbers, with single precision. The purpose is then to calculate the entropy of this file and use it with other datasets entropy to calculate a ratio entropy_file/entropy_randUnique. This value is named "randomness".
I can do this in python with double-precision numbers and inserting them into set()
, using struct.pack
like so:
numbers = set()
while len(numbers) < size:
numbers.add(struct.pack(precision,random.random()))
for num in numbers:
file.write(num)
but when I change to single precision I can't just change the pack method (that will produce a lot of the same numbers and the while will never end), and I can't generate single precision numbers with random
. I've looked into numpy
but the generator works the same way from what I understood. How can I get 370914252 (this is my biggest test case) unique float32 inside a binary file, even if they're not random, I think that a shuffled sequence would suffice..
Your best bet is to generate random 32-bit integers then convert them to floating point. You'll need to reject bit representations of infinity and NAN as you generate the numbers.
You can generate your set
from the integer values rather than the floating point ones, then do the conversion on output. Rather than using a set, you could use a bit map to detect which integer values have already been used; that's more likely to fit in memory, especially given the largest sample size you indicate.
def random_unique_floats(n):
used = bytearray(0 for i in xrange(2**32 // 8))
count = 0
while count < n:
bits = random.getrandbits(32)
value = struct.unpack('f', struct.pack('I', bits))[0]
if not math.isinf(value) and not math.isnan(value):
index = bits // 8
mask = 0x01 << (bits & 0x07)
if used[index] & mask == 0:
yield value
used[index] |= mask
count += 1
for num in random_unique_floats(size):
file.write(struct.pack('f', num))
Note that as your number of samples approaches the number of possible floating-point values, the run time will go up exponentially.
链接地址: http://www.djcxy.com/p/17388.html上一篇: Python OCR:将扫描图像转换为文本进行处理
下一篇: 生成大量独特的随机float32数字