random number generator for cluster environment
How can I generate independent pseudo-random numbers on a cluster, for Monte Carlo simulation for example? I can have many compute nodes (eg 100), and I need to generate millions of numbers on each node. I need a warranty that a PRN sequence on one node will not overlap the PRN sequence on another node.
You should never use potentially overlapping random streams obtained from the same original stream. If you have not tested the resulting interleaved stream, you have no idea of its statistic quality.
Fortunately, Mersenne Twister (MT) will help you in your distribution task. Using its dedicated algorithm, called Dynamic Creator (DC hereafter), you can create independent random number generators that will produce highly independent random streams.
Each stream will be created on the node that will be using it. Basically, think of DC as a constructor in object oriented paradigm that creates different instances of MT. Each different instance is designed to produce highly independent random sequences.
You can find DC here: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html
It's quite straightforward to use and you'll be able to fix different parameters such as the number of different MT instances you want to obtain or the period of these MTs. Depending on its input parameter, DC will runtime will change.
In addition of the README coming along with DC, take a look at the file example/new_example2.c
in the DC archive. It shows example of calls to get independent sequences given a different input identifier , which is basically what you have to identify cluster jobs.
Finally, if you intend to learn more about how to use PRNGs in parallel or distributed environments, I suggest you read this scientific articles:
Practical distribution of random streams for stochastic High Performance Computing , David RC Hill, in International Conference on High Performance Computing and Simulation (HPCS), 2010
Okay, answer #2 ;-)
I am going to say ... keep it simple. Just use a "short" seed to prime the MT (imagine that this seed is 232 bits for lack of better restriction). This assumes that the the short seed generates "sufficiently distributed" MT starting states (eg init_genrand
in the code in my other answer, hopefully). This doesn't guarantee an equally distributed starting state but rather goes for "good enough", see below.
Each node will use it's own sequence of seeds which are pre-selected (either a list of random seeds which is transmitted or a formula like number_nodes * node_number * iteration
). The important thing is that the initial "short" seed will never be re-used across nodes.
Each node will then use a MT PRNG initialized with this seed n
times where n <<< MT_period / max_value_of_short_seed
(TT800 is 2800-1 and MT19937 is 219937-1, so n
can still be a very large number). After n
times, the node moves onto the next seed in the chosen list.
While I do not (nor can I) provide a "guarantee" that no node will ever have a duplicate sequence at the same time (or at all), here is what AMD says about Using Different Seends: (Obviously the initial seeding algorithm plays a role).
Of the four methods for creating multiple streams described here, this is the least satisfactory ... For example, sequences generated from different starting points may overlap if the initial values are not far enough apart. The potential for overlapping sequences is reduced if the period of the generator being used is large. Although there is no guarantee of the independence of the sequences, due to its extremely large period, using the Mersenne Twister with random starting values is unlikely to lead to problems , especially if the number of sequences required is small ...
Happy coding.
I could jump to a know distance in the sequence, on each node. But is there such an algorithm for Mersenne-Twister or for any other good PRNG, which can be done with a reasonable amount of time and memory?
Yes, see http://theo.phys.sci.hiroshima-u.ac.jp/~ishikawa/PRNG/mt_stream_en.html. This is a excellent solution to obtaining independent random number streams. By making jumps that are larger than the number of random numbers needed from each stream to create the starts of each stream, the streams won't overlap.
链接地址: http://www.djcxy.com/p/37300.html上一篇: 依赖PRNG
下一篇: 用于集群环境的随机数生成器