Algorithm to detect corners of paper sheet in photo

What is the best way to detect the corners of an invoice/receipt/sheet-of-paper in a photo? This is to be used for subsequent perspective correction, before OCR.

My current approach has been:

RGB > Gray > Canny Edge Detection with thresholding > Dilate(1) > Remove small objects(6) > clear boarder objects > pick larges blog based on Convex Area. > [corner detection - Not implemented]

I can't help but think there must be a more robust 'intelligent'/statistical approach to handle this type of segmentation. I don't have a lot of training examples, but I could probably get 100 images together.

Broader context:

I'm using matlab to prototype, and planning to implement the system in OpenCV and Tesserect-OCR. This is the first of a number of image processing problems I need to solve for this specific application. So I'm looking to roll my own solution and re-familiarize myself with image processing algorithms.

Here are some sample image that I'd like the algorithm to handle: If you'd like to take up the challenge the large images are at http://madteckhead.com/tmp

case 1 http://madteckhead.com/tmp/IMG_0773_sml.jpg case 2 http://madteckhead.com/tmp/IMG_0774_sml.jpg case 3 http://madteckhead.com/tmp/IMG_0775_sml.jpg case 4 http://madteckhead.com/tmp/IMG_0776_sml.jpg

In the best case this gives:

case 1 - canny http://madteckhead.com/tmp/IMG_0773_canny.jpg case 1 - post canny http://madteckhead.com/tmp/IMG_0773_postcanny.jpg case 1 - largest blog http://madteckhead.com/tmp/IMG_0773_blob.jpg

However it fails easily on other cases:

case 2 - canny http://madteckhead.com/tmp/IMG_0774_canny.jpg case 2 - post canny http://madteckhead.com/tmp/IMG_0774_postcanny.jpg case 2 - largest blog http://madteckhead.com/tmp/IMG_0774_blob.jpg

Thanks in advance for all the great ideas! I love SO!

EDIT: Hough Transform Progress

Q: What algorithm would cluster the hough lines to find corners? Following advice from answers I was able to use the Hough Transform, pick lines, and filter them. My current approach is rather crude. I've made the assumption the invoice will always be less than 15deg out of alignment with the image. I end up with reasonable results for lines if this is the case (see below). But am not entirely sure of a suitable algorithm to cluster the lines (or vote) to extrapolate for the corners. The Hough lines are not continuous. And in the noisy images, there can be parallel lines so some form or distance from line origin metrics are required. Any ideas?

case 1 http://madteckhead.com/tmp/IMG_0773_hough.jpg case 2 http://madteckhead.com/tmp/IMG_0774_hough.jpg case 3 http://madteckhead.com/tmp/IMG_0775_hough.jpg case 4 http://madteckhead.com/tmp/IMG_0776_hough.jpg


I'm Martin's friend who was working on this earlier this year. This was my first ever coding project, and kinda ended in a bit of a rush, so the code needs some errr...decoding... I'll give a few tips from what I've seen you doing already, and then sort my code on my day off tomorrow.

First tip, OpenCV and python are awesome, move to them as soon as possible. :D

Instead of removing small objects and or noise, lower the canny restraints, so it accepts more edges, and then find the largest closed contour (in OpenCV use findcontour() with some simple parameters, I think I used CV_RETR_LIST ). might still struggle when it's on a white piece of paper, but was definitely providing best results.

For the Houghline2() Transform, try with the CV_HOUGH_STANDARD as opposed to the CV_HOUGH_PROBABILISTIC , it'll give rho and theta values, defining the line in polar coordinates, and then you can group the lines within a certain tolerance to those.

My grouping worked as a look up table, for each line outputted from the hough transform it would give a rho and theta pair. If these values were within, say 5% of a pair of values in the table, they were discarded, if they were outside that 5%, a new entry was added to the table.

You can then do analysis of parallel lines or distance between lines much more easily.

Hope this helps.


A student group at my university recently demonstrated an iPhone app (and python OpenCV app) that they'd written to do exactly this. As I remember, the steps were something like this:

  • Median filter to completely remove the text on the paper (this was handwritten text on white paper with fairly good lighting and may not work with printed text, it worked very well). The reason was that it makes the corner detection much easier.
  • Hough Transform for lines
  • Find the peaks in the Hough Transform accumulator space and draw each line across the entire image.
  • Analyse the lines and remove any that are very close to each other and are at a similar angle (cluster the lines into one). This is necessary because the Hough Transform isn't perfect as it's working in a discrete sample space.
  • Find pairs of lines that are roughly parallel and that intersect other pairs to see which lines form quads.
  • This seemed to work fairly well and they were able to take a photo of a piece of paper or book, perform the corner detection and then map the document in the image onto a flat plane in almost realtime (there was a single OpenCV function to perform the mapping). There was no OCR when I saw it working.


    Here's what I came up with after a bit of experimentation:

    import cv, cv2, numpy as np
    import sys
    
    def get_new(old):
        new = np.ones(old.shape, np.uint8)
        cv2.bitwise_not(new,new)
        return new
    
    if __name__ == '__main__':
        orig = cv2.imread(sys.argv[1])
    
        # these constants are carefully picked
        MORPH = 9
        CANNY = 84
        HOUGH = 25
    
        img = cv2.cvtColor(orig, cv2.COLOR_BGR2GRAY)
        cv2.GaussianBlur(img, (3,3), 0, img)
    
    
        # this is to recognize white on white
        kernel = cv2.getStructuringElement(cv2.MORPH_RECT,(MORPH,MORPH))
        dilated = cv2.dilate(img, kernel)
    
        edges = cv2.Canny(dilated, 0, CANNY, apertureSize=3)
    
        lines = cv2.HoughLinesP(edges, 1,  3.14/180, HOUGH)
        for line in lines[0]:
             cv2.line(edges, (line[0], line[1]), (line[2], line[3]),
                             (255,0,0), 2, 8)
    
        # finding contours
        contours, _ = cv2.findContours(edges.copy(), cv.CV_RETR_EXTERNAL,
                                       cv.CV_CHAIN_APPROX_TC89_KCOS)
        contours = filter(lambda cont: cv2.arcLength(cont, False) > 100, contours)
        contours = filter(lambda cont: cv2.contourArea(cont) > 10000, contours)
    
        # simplify contours down to polygons
        rects = []
        for cont in contours:
            rect = cv2.approxPolyDP(cont, 40, True).copy().reshape(-1, 2)
            rects.append(rect)
    
        # that's basically it
        cv2.drawContours(orig, rects,-1,(0,255,0),1)
    
        # show only contours
        new = get_new(img)
        cv2.drawContours(new, rects,-1,(0,255,0),1)
        cv2.GaussianBlur(new, (9,9), 0, new)
        new = cv2.Canny(new, 0, CANNY, apertureSize=3)
    
        cv2.namedWindow('result', cv2.WINDOW_NORMAL)
        cv2.imshow('result', orig)
        cv2.waitKey(0)
        cv2.imshow('result', dilated)
        cv2.waitKey(0)
        cv2.imshow('result', edges)
        cv2.waitKey(0)
        cv2.imshow('result', new)
        cv2.waitKey(0)
    
        cv2.destroyAllWindows()
    

    Not perfect, but at least works for all samples:

    1234

    链接地址: http://www.djcxy.com/p/13962.html

    上一篇: 在matlab中实现智能递归算法

    下一篇: 用于检测照片中纸张角落的算法