Quality of PRNG when not using all bits at once

2018-06-13 01:26:44

I'm currently using xorshift128+ in my project, which I know passes Big Crush and is thought to produce rather high quality random numbers for its speed. However, it produces 64 bit numbers, and the vast majority of random numbers I require are small integers (between 0 and, say, 100 or so).

Now, using % to reduce the 64 bit random number to the desired range results in a non-uniform distribution (some numbers appear more times than others), and in the case of powers of 2 entirely throws away most of the bits. The method where you generate numbers until something is in-range results in a more uniform distribution, but it's somewhat problematic with small numbers, and it felt silly to generate more bits when I already have way more than necessary to begin with.

Hence, I implemented a system that takes the minimum amount of bits necessary (looking for the closest power of 2, eg if I need a range of 0-95 I'll take 7 bits (2^7 = 128) and keep generating 7 bits until I get something under 95, which should always have a probability over 50%, as otherwise I could just use one bit less)

Anyway, the system is in place, and rudimentary statistical tests suggest it's working as expected, plus running blazing fast. However, I have been unable to run TestU01 on the modified system (there doesn't seem to be built in support for dynamic bit sizes) and the original papers have been a bit too dense for me to get through.

Basically, I'm wondering if passing Big Crush both forwards and backwards, as xorshift128+ is purported to do, strongly suggests that every individual bit is satisfactorily random and using them separately should be fine, or if I could be setting myself up for trouble. Plus, optionally, any test suites that would allow me to empirically verify the statistical quality of my generator.

The method where you generate numbers until something is in-range results in a more uniform distribution, but it's somewhat problematic with small numbers [...].

That's true for a naive implementation like the following C99 example:

uint64_t prng(void);

// Returns a random number between 0 and max-1.
uint64_t bounded_prng(uint64_t max) {
    uint64_t r;

    // Rejection sampling.
    do {
        r = prng();
    } while (r >= max);

    return r;
}

But there's another efficient algorithm. You can expand the threshold by using the largest multiple of max that fits in a uint64_t , which is 2^64 - (2^64 % max) . If the PRNG returns a value below this threshold, return the value modulus max , otherwise get another random value.

uint64_t bounded_prng(uint64_t max) {
    // Compute modulus: 2^64 % max = (2^64 - max) % max = -max % max.
    uint64_t mod = -max % max;

    // Compute threshold: 2^64 - mod = -mod
    uint64_t threshold = -mod;

    uint64_t r;

    do {
        r = prng();
    } while (r >= threshold);

    return r % max;
}

Now the probability that a random value is rejected is guaranteed to be smaller than 50%, resulting in a very efficient algorithm just like your bit masking approach. For small bounds, the probability that prng is called more than once is extremely small. But if you know the bound beforehand, your bit masking solution might still win out.

You can optimize this a little by rejecting values smaller than a threshold:

uint64_t bounded_prng(uint64_t max) {
    // Compute threshold: 2^64 % max = (2^64 - max) % max = -max % max.
    uint64_t threshold = -max % max;

    uint64_t r;

    do {
        r = prng();
    } while (r < threshold);

    return r % max;
}

If you want to compute bounded random values, you must only make sure to eliminate any bias. Then the quality of bounded random values should match the original PRNG. If your PRNG passes Big Crush, you will certainly get high-quality bounded random numbers. Your approach and any of the ones I showed are fine.

What you are doing is the most common way to generate a uniformly distributed int, and Big Crush is a really good test suite. It certainly ensures that the bits are individually random-like, along with a whole bunch of other things you'd never think of.

链接地址: http://www.djcxy.com/p/37276.html

上一篇: 从一个非常大的范围复制随机值

下一篇: 一次不使用所有位时的PRNG质量