Algorithm challenge to merge sets
Given n sets of numbers. Each set contains some numbers from 1 to 100. How to select sets to merge into the longest set under a special rule, only two non-overlapping sets can merge. [1,2,3] can merge with [4,5] but not [3,4]. What will be an efficient algorithm to merge into the longest set.
My first attempt is to form an n by n matrix. Each row/column represents a set. Entry(i,j) equals to 1 if two sets overlap, entry(i,i) stores the length of set i. Then the questions becomes can we perform row and column operations at the same time to create a diagonal sub-matrix on top left corner whose trace is as large as possible.
However, I got stuck in how to efficiently perform row and column operations to form such a diagonal sub-matrix on top left corner.
As already pointed out in the comments (maximum coverage problem) you have a NP-hart problem. Luckily, matlab offers solvers for integer linear programming.
So we try to reduce the problem to the form:
min f*x subject to Ax<=b , 0<=x
There are n sets, we can encode a set as a vector of 0s and 1s. For example (1,1,1,0,0,...)
would represent {1,2,3}
and (0,0,1,1,0,0...)
- {3,4}
.
Every column of A
represents a set. A(i,j)=1
means that the i
-th element is in the j
-th set, A(i,j)=0
means that the i
-th element is not in the j
-th set.
Now, x
represents the sets we select: if x_j=1
than the set j
is selected, if x_j=0
- than not selected!
As every element must be at most in one set, we choose b=(1, 1, 1, ..., 1)
: If we take two sets which contain the i
-th element, than the i
-th element of (Ax)
would be at least 2.
The only question left is what is f
? We try to maximize the number of elements in the union, so we choose f_j=-|set_j|
(minus owing to min<->max conversion), with |set_j|
- number of elements in the j
-th set.
Putting it all in matlab we get:
f=-sum(A)
xopt=intlinprog(f.',1:n,A,ones(m,1),[],[],zeros(n,1),ones(n,1))
f.'
- cost function as column 1:n
- all n
elements of x
are integers A
- encodes n
sets ones(m,1)
- b=(1,1,1...)
, there are m=100
elements [],[]
- there are no constrains of the form Aeq*x=beq
zeros(n,1), 0<=x
must hold ones(n,1), x<=1
follows already from others constrains, but maybe it will help the solver a little bit You can represent sets as bit fields. A bitwise and operation yielding zero would indicate non-overlapping sets. Depending on the width of the underlying data type, you may need to perform multiple and operations. For example, with a 64 bit machine word size, I'd need two words to cover 1 to 100 as a bit field.
链接地址: http://www.djcxy.com/p/40224.html上一篇: 如何开始开发Internet Explorer扩展?
下一篇: 算法挑战合并集合