Detecting comic strip dialogue bubble regions in images

I have an grayscale image of a comic strip page that features several dialogue bubbles (=speech baloons, etc), that are enclosed areas with white background and solid black borders that contain text inside, ie something like that:

样本漫画图像

I want to detect these regions and create a mask (binary is ok) that will cover all the inside regions of dialogue bubbles, ie something like:

The same image, mask overlaid, to be totally clear:

具有透明遮罩覆盖图的示例图像

So, my basic idea of the algorithm was something like:

  • Detect where the text is — plant at least one pixel in every bubble. Dilate these regions somewhat and apply threshold to get a better starting ground; I've done this part:
  • Use a flood fill or some sort of graph traversal, starting from every white pixel detected as a pixel-inside-bubble on step 1, but working on initial image, flooding white pixels (which are supposed to be inside the bubble) and stopping on dark pixels (which are supposed to be borders or text).

  • Use some sort of binary_closing operation to remove dark areas (ie regions that correspond to text) inside bubbles). This part works ok.

  • So far, steps 1 and 3 work, but I'm struggling with step 2. I'm currently working with scikit-image, and I don't see any ready-made algorithms like flood fill implemented there. Obviously, I can use something trivial like breadth-first traversal, basically as suggested here, but it's really slow when done in Python. I suspect that intricate morphology stuff like binary_erosion or generate_binary_structure in ndimage or scikit-image, but I struggle to understand all that morphology terminology and basically how do I implement such a custom flood fill with it (ie starting with step 1 image, working on original image and producing output to separate output image).

    I'm open to any suggestions, including ones in OpenCV, etc.


    Even though your actual question is concerning step 2 of your processing pipeline, I would like to suggest another approach, that might be, imho, simpler and as you stated that you are open to suggestions.

  • Using the image from your original step 1 you could create an image without text in the bubbles.

    Implemented

  • Detect edges on the original image with removed text. This should work well for the speech bubbles, as the bubble edges are pretty distinct.

    Edge detection

  • Finally use the edge image and the initially detected "text locations" in order to find those areas within the edge image that contain text.

    Watershed-Segmentation

  • I am sorry for this very general answer, but here it's too late for actual coding for me, but if the question is still open and you need/want some more hints concerning my suggestion, I will elaborate it in more detail. But you could definitely have a look at the Region based segmentation in the scikit-image docs.


    While your overall task aims further, your actual question is about your step 2, how to implement a flood fill algorithm on a data set which has detected text in bubbles.

    Since you do not give source code, I had to create something from scratch which hopefully interfaces well with your output from step 1. For this I just took 2 fixed coordinates, you would take white points close to blob centers created from text you have extracted in step 1. As soon as you provide proper code, one can adjust that interface.

    I took the liberty to fill all internal holes created by the letters you found, If you do not want this, you can skip the code from line 36 on.

    For the solution I have actually taken ideas from two pieces of code which I cited in the snipped below. You may find more helpful information there.

    Keep us posted on your progress!

    import cv2
    import numpy as np
    
    # with ideas from:
    # http://www.learnopencv.com/filling-holes-in-an-image-using-opencv-python-c/
    # http://stackoverflow.com/questions/10316057/filling-holes-inside-a-binary-object
    print cv2.__file__
    
    # Read image
    im_in = cv2.imread("gIEXY.png", cv2.IMREAD_GRAYSCALE);
    
    # Threshold.
    # Set values equal to or above 200 to 0.
    # Set values below 200 to 255.
    
    th, im_th = cv2.threshold(im_in, 200, 255, cv2.THRESH_BINARY_INV);
    
    # Copy the thresholded image.
    im_floodfill = im_th.copy()
    
    # Mask used to flood filling.
    # Notice the size needs to be 2 pixels than the image.
    h, w = im_th.shape[:2]
    mask = np.zeros((h+2, w+2), np.uint8)
    
    # Floodfill from points inside baloons
    cv2.floodFill(im_floodfill, mask, (80,400), 128);
    cv2.floodFill(im_floodfill, mask, (610,90), 128);
    
    # Invert floodfilled image
    im_floodfill_inv = cv2.bitwise_not(im_floodfill)
    
    # Combine the two images to get the foreground
    im_out = im_th | im_floodfill_inv
    
    # Create binary image from segments with holes
    th, im_th2 = cv2.threshold(im_out, 130, 255, cv2.THRESH_BINARY)
    
    # Create contours to fill holes
    im_th3 = cv2.bitwise_not(im_th2)
    contour,hier = cv2.findContours(im_th3,cv2.RETR_CCOMP,cv2.CHAIN_APPROX_SIMPLE)
    
    for cnt in contour:
        cv2.drawContours(im_th3,[cnt],0,255,-1)
    
    segm = cv2.bitwise_not(im_th3)
    
    
    # Display image
    cv2.imshow("Original", im_in)
    cv2.imshow("Segmented", segm)
    cv2.waitKey(0)
    
    链接地址: http://www.djcxy.com/p/61072.html

    上一篇: 使用OpenCV检测写在白板上的符号

    下一篇: 检测图像中的漫画对话泡泡区域