Possibility of creating a software that can recognize context of an image?

I raised this question due to curiousity while using Google Goggle and Google's "Search by Image".

If you try giving Google an image to search, it can show you some results. Identical images work best (of course), but taken photo of various objects could be difficult.

I guess Google Goggle has workaround a bit by using text recognition and image matching recognition. If text recognition found the text, for instance, "SONY", then things might get simpler. If a brand's image is detected, then things should be simpler as well. The same goes with other famous brand and famous landmark, such as an Eiffel Tower. Having text and brand's image could help recognize things easily.

But if we are to search for something more obscure (need a better wording here), for instance, take this ramen image.

拉面

If you put this image into Google, you will get images of various other images that have similar colors and sometimes similar shape. Heck, there are other ramen images in the result, but I think it would be better if these ramen images are up in the top, since we input a ramen image, and our context here is ramen.

So here is my question, will it be possible to create such a software that can understand the context of the image? How can we express the context in the software?


Man, you just pointet out the very reason why so much people work on computer vision.

Is is quite easy to mathematically describe objects. Color, shape, density, . . . All those can be calculated easily.

But computer vision becomes very complex when talking about "real life objects".

Angle, luminosity, and simply non consistency make it really almost impossible to detect an object accurately.

When working on computer vision, you should always ask yourself : what makes the object I want to recognize unique ?

What descriptor can I use that no other object possess ?

Ask yourself the question for theses ramen. Let's say I simply want to detect ramens. What if the color of the soup changes? What if the meat is bigger ?

If you want to know more, you should read about pattern recognition and pattern matching.

And if you can find the solution to this kind of problems in a generic way, you can register for the nobel price I think :)

Some things are quite well known nowadays, like face recognition or OCR; but they are often quite specialized and apply to only one domain. Think about it, even Google's image search algorithm sucks when you feed it with ramen. It is pretty efficient with sudoku though, as he knows exactly what he is searching for. All the difference is made in training, where you give a list of assumptions to help the algorithm.

So basically you got it. either you create a really nice computer vision system good at detecting one thing based on a lot of assumptions, or an "ok" but quite generic one :). The choice mostly depends on your application

链接地址: http://www.djcxy.com/p/68752.html

上一篇: 数据结构设计和存储

下一篇: 创建可识别图像上下文的软件的可能性?