How do I find Waldo with Mathematica?

This was bugging me over the weekend: What is a good way to solve those Where's Waldo? ['Wally' outside of North America] puzzles, using Mathematica (image-processing and other functionality)?

Here is what I have so far, a function which reduces the visual complexity a little bit by dimming some of the non-red colors:

whereIsWaldo[url_] := Module[{waldo, waldo2, waldoMask},
    waldo = Import[url];
    waldo2 = Image[ImageData[
        waldo] /. {{r_, g_, b_} /;
          Not[r > .7 && g < .3 && b < .3] :> {0, 0,
          0}, {r_, g_, b_} /; (r > .7 && g < .3 && b < .3) :> {1, 1,
          1}}];
    waldoMask = Closing[waldo2, 4];
    ImageCompose[waldo, {waldoMask, .5}]
]

And an example of a URL where this 'works':

whereIsWaldo["http://www.findwaldo.com/fankit/graphics/IntlManOfLiterature/Scenes/DepartmentStore.jpg"]

(Waldo is by the cash register):

Mathematica图形


I've found Waldo!

沃尔多已被发现

How I've done it

First, I'm filtering out all colours that aren't red

waldo = Import["http://www.findwaldo.com/fankit/graphics/IntlManOfLiterature/Scenes/DepartmentStore.jpg"];
red = Fold[ImageSubtract, #[[1]], Rest[#]] &@ColorSeparate[waldo];

Next, I'm calculating the correlation of this image with a simple black and white pattern to find the red and white transitions in the shirt.

corr = ImageCorrelate[red, 
   Image@Join[ConstantArray[1, {2, 4}], ConstantArray[0, {2, 4}]], 
   NormalizedSquaredEuclideanDistance];

I use Binarize to pick out the pixels in the image with a sufficiently high correlation and draw white circle around them to emphasize them using Dilation

pos = Dilation[ColorNegate[Binarize[corr, .12]], DiskMatrix[30]];

I had to play around a little with the level. If the level is too high, too many false positives are picked out.

Finally I'm combining this result with the original image to get the result above

found = ImageMultiply[waldo, ImageAdd[ColorConvert[pos, "GrayLevel"], .5]]

My guess at a "bulletproof way to do this" (think CIA finding Waldo in any satellite image any time, not just a single image without competing elements, like striped shirts)... I would train a Boltzmann machine on many images of Waldo - all variations of him sitting, standing, occluded, etc.; shirt, hat, camera, and all the works. You don't need a large corpus of Waldos (maybe 3-5 will be enough), but the more the better.

This will assign clouds of probabilities to various elements occurring in whatever the correct arrangement, and then establish (via segmentation) what an average object size is, fragment the source image into cells of objects which most resemble individual people (considering possible occlusions and pose changes), but since Waldo pictures usually include a LOT of people at about the same scale, this should be a very easy task, then feed these segments of the pre-trained Boltzmann machine. It will give you probability of each one being Waldo. Take one with the highest probability.

This is how OCR, ZIP code readers, and strokeless handwriting recognition work today. Basically you know the answer is there, you know more or less what it should look like, and everything else may have common elements, but is definitely "not it", so you don't bother with the "not it"s, you just look of the likelihood of "it" among all possible "it"s you've seen before" (in ZIP codes for example, you'd train BM for just 1s, just 2s, just 3s, etc., then feed each digit to each machine, and pick one that has most confidence). This works a lot better than a single neural network learning features of all numbers.


I agree with @GregoryKlopper that the right way to solve the general problem of finding Waldo (or any object of interest) in an arbitrary image would be to train a supervised machine learning classifier. Using many positive and negative labeled examples, an algorithm such as Support Vector Machine, Boosted Decision Stump or Boltzmann Machine could likely be trained to achieve high accuracy on this problem. Mathematica even includes these algorithms in its Machine Learning Framework.

The two challenges with training a Waldo classifier would be:

  • Determining the right image feature transform. This is where @Heike's answer would be useful: a red filter and a stripped pattern detector (eg, wavelet or DCT decomposition) would be a good way to turn raw pixels into a format that the classification algorithm could learn from. A block-based decomposition that assesses all subsections of the image would also be required ... but this is made easier by the fact that Waldo is a) always roughly the same size and b) always present exactly once in each image.
  • Obtaining enough training examples. SVMs work best with at least 100 examples of each class. Commercial applications of boosting (eg, the face-focusing in digital cameras) are trained on millions of positive and negative examples.
  • A quick Google image search turns up some good data -- I'm going to have a go at collecting some training examples and coding this up right now!

    However, even a machine learning approach (or the rule-based approach suggested by @iND) will struggle for an image like the Land of Waldos!

    链接地址: http://www.djcxy.com/p/6054.html

    上一篇: 在Mathematica中最小化NExpectation以获得自定义分布

    下一篇: 如何用Mathematica找到Waldo?