Please advise me about this Computer Vision task

I have some magnets with images on it. There is only a small set of different images (lets say 20 images), and they will be aligned in fixed chess boar (still not shown in the picture).

电脑视觉棋盘游戏yum yum美味

I have already an algorithm to extract each single magnet card from the board, applying a perspective transformation to the original image to prevent perspective distortion.

I would like to hear your advices in order to detect each single image on the board, I mean, detect if the magnet is: a panda, a rabbit, a dog, a carrot... since my main objective is to analyze the image and extract a matrix containing all the board elements.

My first attempt was very basic: guess the image according to the average color. It was not very robust since there are several images with similar average colors (specially those frozen cards), and light tinting can change color a lot.

Would you be so kind to point me in the right direction to extract a matrix containing all the images on the board? I don't need a specific implementation, but instead the concept of the steps that I should follow or techniques to be aplied to the main image in order to obtain a robust (and not too complex) algorithm.

I'm gonna implement it using OpenCV, but I guess that it would be the same using any other computer vision libraries.

Thanks a lot for your time!


While SIFT or other feature detectors work well and are widely applicable, I would always start with the simplest thing that could work: In your case, that's probably template matching. After all, you've already done the hard work of removing perspective distortion, rotation and different scale.

The basic idea is to take each of your 20 template images, and compare it to every possible position in the search image. Thanks to the convolution theorem, this is a relatively inexpensive operation.


Here is some example code:

void KeypointMatcher::computeMatches(cv::Mat image1, cv::Mat image2) {
        cv::initModule_nonfree();
        //Vector of keypoints   
        std::vector<cv::KeyPoint> keypoints1;
        std::vector<cv::KeyPoint> keypoints2;
        //Detect features
        cv::SiftFeatureDetector detector;
        detector.detect(image1, keypoints1);
        detector.detect(image2, keypoints2);

        //Compute descriptors
        cv::Mat descriptors1, descriptors2;
        cv::SiftDescriptorExtractor extractor;
        extractor.compute(image1, keypoints1, descriptors1);
        extractor.compute(image2, keypoints2, descriptors2);

        //Find matches
        cv::BFMatcher matcher(4, true);
        std::vector<cv::DMatch> matches;
        matcher.match(descriptors1, descriptors2, matches);

        //Find good points for a homography
        std::vector<cv::Point2d> matchingPoints1, matchingPoints2;
        for (cv::DMatch& match : matches) {
            matchingPoints1.push_back(keypoints1[match.queryIdx].pt);
            matchingPoints2.push_back(keypoints2[match.trainIdx].pt);
        }
        cv::Mat mask;
        cv::findHomography(matchingPoints1, matchingPoints2, CV_RANSAC, 3, mask);

        //Just visualisation
        std::vector<char> vectorMask;
        mask.col(0).copyTo(vectorMask);
        cv::Mat result;
        cv::drawMatches(image1, keypoints1, image2, keypoints2, matches, result, cv::Scalar::all(-1), cv::Scalar::all(-1), vectorMask);
        //cv::drawMatches(image1, keypoints1, image2, keypoints2, matches, result);
        cv::imshow("Matches", result);
        cv::waitKey();
    }

It does the following: It extracts feature points from both images using SIFT. Then it finds matches between the features of both images. However, this will still contain a lot of rubbish matches. But we know that the transformation between the two images we're comparing is a perspective one, thus it must be possible to compute a homography between them. To find the "good" matches it now computes a homography with the RANSAC outlier filtering method. Which ones RANSAC considers outliers and which ones inlieres is saved inside the matrix mask (a 0 for outlier and a 1 for inlier). You could now eg count the ones inside this matrix to obtain the number of good matches. Then you could do this for the images of all the cards and compare the match count (eg pick the one with the most matches).

The last part of the code visualizes the good matches that were found.

Hope this helps.

链接地址: http://www.djcxy.com/p/39510.html

上一篇: 使用OpenCV进行矩形检测/跟踪

下一篇: 请告诉我关于这个计算机视觉任务