Random walks in directed graphs/networks

I have a weighted graph with (in practice) up to 50,000 vertices. Given a vertex, I want to randomly choose an adjacent vertex based on the relative weights of all adjacent edges.

How should I store this graph in memory so that making the selection is efficient? What is the best algorithm? It could be as simple as a key value store for each vertex, but that might not lend itself to the most efficient algorithm. I'll also need to be able update the network.

Note that I'd like to take only one "step" at a time.


More Formally : Given a weighted, directed, and potentially complete graph, let W(a,b) be the weight of edge a->b and let Wa be the sum of all edges from a. Given an input vertex v, I want to choose a vertex randomly where the likelihood of choosing vertex x is W(v,x) / Wv

Example :

Say W(v,a) = 2, W(v,b) = 1, W(v,c) = 1.

Given input v, the function should return a with probability 0.5 and b or c with probability 0.25.


If you are concerned about the performance of generating the random walk you may use the alias method to build a datastructure which fits your requirements of choosing a random outgoing edge quite well. The overhead is just that you have to assign each directed edge a probability weight and a so-called alias-edge.

So for each note you have a vector of outgoing edges together with the weight and the alias edge. Then you may choose random edges in constant time (only the generation of th edata structure is linear time with respect to number of total edges or number of node edges). In the example the edge is denoted by ->[NODE] and node v corresponds to the example given above:

Node v
    ->a (p=1,   alias= ...)
    ->b (p=3/4, alias= ->a)
    ->c (p=3/4, alias= ->a)

Node a
    ->c (p=1/2, alias= ->b)
    ->b (p=1,   alias= ...)

...

If you want to choose an outgoing edge (ie the next node) you just have to generate a single random number r uniform from interval [0,1).

You then get no=floor(N[v] * r) and pv=frac(N[v] * r) where N[v] is the number of outgoing edges. Ie you pick each edge with the exact same probability (namely 1/3 in the example of node v ).

Then you compare the assigned probability p of this edge with the generated value pv . If pv is less you keep the edge selected before, otherwise you choose its alias edge.

If for example we have r=0.6 from our random number generator we have

no = floor(0.6*3) = 1 
pv = frac(0.6*3) = 0.8

Therefore we choose the second outgoing edge (note the index starts with zero) which is

->b (p=3/4, alias= ->a)

and switch to the alias edge ->a since p=3/4 < pv .

For the example of node v we therefore

  • choose edge b with probability 1/3*3/4 (ie whenever no=1 and pv<3/4 )
  • choose edge c with probability 1/3*3/4 (ie whenever no=2 and pv<3/4 )
  • choose edge a with probability 1/3 + 1/3*1/4 + 1/3*1/4 (ie whenever no=0 or pv>=3/4 )

  • In theory the absolutely most efficient thing to do is to store, for each node, the moral equivalent of a balanced binary tree (red-black, or BTree, or skip list all fit) of the connected nodes and their weights, and the total weight to each side. Then you can pick a random number from 0 to 1, multiply by the total weight of the connected nodes, then do a binary search to find it.

    However traversing a binary tree like that involves a lot of choices, which have a tendency to create pipeline stalls. Which are very expensive. So in practice if you're programming in an efficient language (eg C++), if you've got less than a couple of hundred connected edges per node, a linear list of edges (with a pre-computed sum) that you walk in a loop may prove to be faster.

    链接地址: http://www.djcxy.com/p/70684.html

    上一篇: 顶点或从有向图中所有其他顶点可到达的顶点集

    下一篇: 随机行走有向图/网络