Algorithm challenge to merge sets

2018-06-14 03:04:08

Given n sets of numbers. Each set contains some numbers from 1 to 100. How to select sets to merge into the longest set under a special rule, only two non-overlapping sets can merge. [1,2,3] can merge with [4,5] but not [3,4]. What will be an efficient algorithm to merge into the longest set.

My first attempt is to form an n by n matrix. Each row/column represents a set. Entry(i,j) equals to 1 if two sets overlap, entry(i,i) stores the length of set i. Then the questions becomes can we perform row and column operations at the same time to create a diagonal sub-matrix on top left corner whose trace is as large as possible.

However, I got stuck in how to efficiently perform row and column operations to form such a diagonal sub-matrix on top left corner.

As already pointed out in the comments (maximum coverage problem) you have a NP-hart problem. Luckily, matlab offers solvers for integer linear programming.

So we try to reduce the problem to the form:

min f*x subject to Ax<=b , 0<=x

There are n sets, we can encode a set as a vector of 0s and 1s. For example (1,1,1,0,0,...) would represent {1,2,3} and (0,0,1,1,0,0...) - {3,4} .

Every column of A represents a set. A(i,j)=1 means that the i -th element is in the j -th set, A(i,j)=0 means that the i -th element is not in the j -th set.

Now, x represents the sets we select: if x_j=1 than the set j is selected, if x_j=0 - than not selected!

As every element must be at most in one set, we choose b=(1, 1, 1, ..., 1) : If we take two sets which contain the i -th element, than the i -th element of (Ax) would be at least 2.

The only question left is what is f ? We try to maximize the number of elements in the union, so we choose f_j=-|set_j| (minus owing to min<->max conversion), with |set_j| - number of elements in the j -th set.

Putting it all in matlab we get:

f=-sum(A)
xopt=intlinprog(f.',1:n,A,ones(m,1),[],[],zeros(n,1),ones(n,1))

f.' - cost function as column

1:n - all n elements of x are integers

A - encodes n sets

ones(m,1) - b=(1,1,1...) , there are m=100 elements

[],[] - there are no constrains of the form Aeq*x=beq

zeros(n,1), 0<=x must hold

ones(n,1), x<=1 follows already from others constrains, but maybe it will help the solver a little bit

You can represent sets as bit fields. A bitwise and operation yielding zero would indicate non-overlapping sets. Depending on the width of the underlying data type, you may need to perform multiple and operations. For example, with a 64 bit machine word size, I'd need two words to cover 1 to 100 as a bit field.

链接地址: http://www.djcxy.com/p/40224.html

上一篇: 如何开始开发Internet Explorer扩展？

下一篇: 算法挑战合并集合