Naive way to find largest block in a rectangle of 1's and 0's

I'm trying to come up with the bruteforce(naive) solution to find the largest block of 1 or 0 in a rectangle of 1 and 0. I know optimal ways which can do it in O(n) time where n is the total size of rectangle.

1 1 0 1 0 1
1 0 0 0 1 1
1 0 0 0 1 1
1 1 0 1 1 0

In the above rectangle, it is rectangle starting at (Row 2, Col 2) of size 6. I was thinking this..

Go through each element and then find the size it makes by iterating in all directions from it.

Is it bruteforce? What will be the complexity? I'm going through all elements that is n, but then I'm iterating in all directions, how much will that be?

I'm aware that this question has been asked 100 times, but they talk about optimal solutions. What I'm looking for is a bruteforce solution and its complexity?


The algorithm you described looks somehow like this C code:

//for each entry
for(int x = 0; x < width; ++x)
    for(int y = 0; y < height; ++y)
    {
        char lookFor = entries[x][y];
        int area = 0;
        for(int row = y; row < height; ++row)
        {
            if(entries[x, row] != lookFor)
                break;
            for(int col = x; col < width; ++col)
            {
                if(entries[col, row] != lookFor)
                    break;
                int currentArea = (col - x + 1) * (row - y + 1);
                if(currentArea > area)
                {
                    //save the current rect
                }
            }
        }
    }

There are four nested loops. The outer loops will iterate exactly n times (with n being the number of entries). The inner loops will iterate width * f1 and height * f2 times in average (with f1 and f2 being some constant fraction). The rest of the algorithm takes constant time and does not depend on the problem size.

Therefore, the complexity is O(n * f1 * width * f2 * height) = O(n^2) , which essentially means "go to each entry and from there, visit each entry again", regardless of whether all entries really need to be checked or just a constant fraction that increases with the problem size.

Edit

The above explanations assume that the entries are not distributed randomly and that for larger fields it is more likely to find larger homogeneous subregions. If this is not the case and the average iteration count for the inner loops does not depend on the field size at all (eg for randomly distributed entries), then the resulting time complexity is O(n)


Look into "connected components" algorithms for additional ideas. What you've presented as a two-dimensional array of binary values looks just like a binary black & white image. An important exception is that in image processing we typically allow a connected component (a blob of 0s or 1s) to have non-rectangular shapes. Some tweaks to the existing multi-pass and single-pass algorithms should be easy to implement.

http://en.wikipedia.org/wiki/Connected-component_labeling

Although it's a more general solution than you need, you could also run a connected components algorithm to find all connected regions (0s or 1s, background or foreground) and then filter the resulting components (aka blobs). I'll also mention that for foreground components it's preferable to select for "4-connectivity" rather than "8-connectivity," where the former means connectivity is allowed only at pixels above, below, left, and right of a center pixel, and the latter means connectivity is allowed for any of the eight pixels surrounding a center pixel.

A bit farther afield, for very large 2D arrays it may (just may) help to first reduce the search space by creating what we'd call an "image pyramid," meaning a stack of arrays of progressively smaller size: 1/2 each dimension (filled, as needed), 1/4, 1/8, and so on. A rectangle detectable in a reduced resolution image is a good candidate for being the largest rectangle in a very large image (or 2D array of bits). Although that may not be the best solution for whatever cases you're considering, it's scalable. Some effort would be required, naturally, to determine the cost of scaling the array/image versus the cost of maintaining relatively larger lists of candidate rectangles in the original, large image.

Run-length encoding may help you, too, especially since you're dealing with rectangles instead of connected components of arbitrary shape. Run-length encoding would express each row as stretches or "run lengths" of 0s or 1s. This technique was used to speed up connected component algorithms a decade or two ago, and it's still a reasonable approach.

Anyway, that's not the most direct answer to your question, but I hope it helps somewhat.


Brute force is generally split into two (sometimes sequential) parts. The first part is generating all possible candidates for solutions to the problem. The second part is testing them to see if they actually are solutions.

Brute force: Assume your rectangle is mx n. Generate all sub-rectangles of size axb where a is in {1..m} and b is in {1..n}. Set a maximum variable to 0. Test all sub-rectangles to see if it is all 0s and 1s. If it is, let maximum = max(maximum, size(sub-rectangle) . Alternatively simply start by testing the larger sub-rectangles and move towards testing smaller sub-rectangles. As soon as you find a sub-rectangle with all 0s or 1s, stop. The time complexity will be the same because in the worst-case for both methods, you will still have to iterate through all sub-rectangles.

Time complexity:

Let's count the number of sub-rectangles generated at each step.

There are m*n subrectangles of size 1 x 1.

There are (m-1)*n subrectangles of size 2 x 1.

There are m*(n-1) subrectangles of size 1 x 2.

There are (m-1)*(n-1) subrectangles of size 2 x 2.

... < and so forth >

There are (m-(m-1))*(n-(n-1)) subrectangles of size mx n.

Thus the formula for counting the number of subrectangles of size axb from a larger rectangle of size mxn is simply:

number_of_subrectangles_of_size_a_b = (ma) * (mb)

If we imagine these numbers laid out in an arithmetic series we get

1*1 + 1*2 + ... + 1*n + 2*1 + ... + m*n

This can be factored to:

(1 + 2 + ... + m) * (1 + 2 + ... + n) .

These two arithmetic series converge to the order of O(m2) and O(n2) respectively. Thus generating all sub-rectangles of an m*n rectangle is O(m2n2). Now we look at the testing phase.

After generating all sub-rectangles, testing if each sub-rectangle of size axb is all 0s or all 1s is O(a * b). Unlike the previous step of generating sub-rectangles of size axb which scales upwards as axb decreases, this step increases proportionally with the size of ax b.

eg: There is 1 sub-rectangle of size mx n. But testing to see if that rectangle is all 0s or all 1s takes O(m*n) time. Conversely there are m*n sub-rectangles of size 1. But testing to see if those rectangles are all 0s or all 1s takes only O(1) time per rectangle.

What you finally end up for the time complexity is a series like this:

O( (m-(m-1))(n-(n-1))*(mn) + (m-(m-1))(n-(n-2))*m(n-1)... + mn*(m-(m-1))(n-(n-1)) )

Note 2 things here.

  • The largest term in the series is going to be somewhere close to (m / 2)(n / 2)(m / 2) (n / 2) which is O(m2n2)

  • There are m * n terms total in the series.

  • Thus the testing phase of the brute-force solution will be O(mn * m2n2) = O(m3n3).

    Total time complexity is:

    O(generating) + O(testing) = O(m2n2 + m3n3) = O(m3n3)

    If the area of the given rectangle is say, N, we will have O(N3) time complexity.

    链接地址: http://www.djcxy.com/p/16924.html

    上一篇: 如何使用流星将图像从FileReader上传到Amazon S3

    下一篇: 天真的方式在1和0的矩形中查找最大块