Python, OpenCV: classify gender using ORB features and KNN

Task: Classify images of human faces as female or male. Training images with labels are available, obtain the test image from webcam.

Using: Python 2.7, OpenCV 2.4.4

I am using ORB to extract features from a grayscale image which I hope to use for training a K-Nearest Neighbor classifier. Each training image is of a different person so the number of keypoints and descriptors for each image are obviously different. My problem is that I'm not able to understand the OpenCV docs for KNN and ORB. I've seen other SO questions about ORB, KNN and FLANN but they didn't help much.

What exactly is the nature of the descriptor given by ORB? How is it different than descriptors obtained by BRIEF, SURF, SIFT, etc.?

It seems that the feature descriptors should be of the same size for each training sample in KNN. How do I make sure that the descriptors are of the same size for each image? More generally, in what format should features be presented to KNN for training with given data and labels? Should the data be an int or float? Can it be char?

The training data can be found here.

I am also using the haarcascade_frontalface_alt.xml from opencv samples

Right now the KNN model is given just 10 images for training to see if my program passes without errors which, it does not.

Here is my code:

import cv2
from numpy import float32 as np.float32

def chooseCascade():
    # TODO: Option for diferent cascades
    # HAAR Classifier for frontal face
    _cascade = cv2.CascadeClassifier('haarcascade_frontalface_alt.xml')
    return _cascade

def cropToObj(cascade,imageFile):
    # Load as 1-channel grayscale image
    image = cv2.imread(imageFile,0)

    # Crop to the object of interest in the image
    objRegion = cascade.detectMultiScale(image) # TODO: What if multiple ojbects in image?

    x1 = objRegion[0,0]
    y1 = objRegion[0,1]
    x1PlusWidth = objRegion[0,0]+objRegion[0,2]
    y1PlusHeight = objRegion[0,1]+objRegion[0,3]

    _objImage = image[y1:y1PlusHeight,x1:x1PlusWidth]

    return _objImage

def recognizer(fileNames):
    # ORB contructor
    orb = cv2.ORB(nfeatures=100)

    keyPoints = []
    descriptors = [] 

    # A cascade for face detection
    haarFaceCascade = chooseCascade()

    # Start processing images
    for imageFile in fileNames:
        # Find faces using the HAAR cascade
        faceImage = cropToObj(haarFaceCascade,imageFile)

        # Extract keypoints and description 
        faceKeyPoints, faceDescriptors = orb.detectAndCompute(faceImage, mask = None)

        #print faceDescriptors.shape
        descRow = faceDescriptors.shape[0]
        descCol = faceDescriptors.shape[1]

        flatFaceDescriptors = faceDescriptors.reshape(descRow*descCol).astype(np.float32)

        keyPoints.append(faceKeyPoints)
        descriptors.append(flatFaceDescriptors)

    print descriptors

    # KNN model and training on descriptors
    responses = []
    for name in fileNames:
        if name.startswith('BF'):
            responses.append(0) # Female
        else:
            responses.append(1) # Male

    knn = cv2.KNearest()
    knnTrainSuccess = knn.train(descriptors,
                                responses,
                                isRegression = False) # isRegression = false, implies classification

    # Obtain test face image from cam
    capture = cv2.VideoCapture(0)
    closeCamera = -1
    while(closeCamera < 0):
        _retval, _camImage = capture.retrieve()      

        # Find face in camera image
        testFaceImage = haarFaceCascade.detectMultiScale(_camImage) # TODO: What if multiple faces?

        # Keyponts and descriptors of test face image
        testFaceKP, testFaceDesc = orb.detectAndCompute(testFaceImage, mask = None)
        testDescRow = testFaceDesc.shape[0]
        flatTestFaceDesc = testFaceDesc.reshape(1,testDescRow*testDescCol).astype(np.float32) 

        # Args in knn.find_nearest: testData, neighborhood
        returnedValue, result, neighborResponse, distance = knn.find_nearest(flatTestFaceDesc,3) 

        print returnedValue, result, neighborResponse, distance


        # Display results
        # TODO: Overlay classification text
        cv2.imshow("testImage", _camImage)

        closeCamera = cv2.waitKey(1)
    cv2.destroyAllWindows()


if __name__ == '__main__':
    fileNames = ['BF09NES_gray.jpg', 
                 'BF11NES_gray.jpg', 
                 'BF13NES_gray.jpg', 
                 'BF14NES_gray.jpg', 
                 'BF18NES_gray.jpg', 
                 'BM25NES_gray.jpg', 
                 'BM26NES_gray.jpg', 
                 'BM29NES_gray.jpg', 
                 'BM31NES_gray.jpg', 
                 'BM34NES_gray.jpg']

    recognizer(fileNames)

Currently I am getting an error at the line with knn.train() where descriptors is not detected as a numpy array.

Also, is this approach completely wrong? Am I supposed to use some other way for gender classification? I wasn't satisfied with the fisherface and eigenface example in the opencv facerec demo so please don't direct me to those.

Any other help is much appreciated. Thanks.

--- EDIT ---

I've tried a few things and come up with an answer.

I am still hoping that someone in SO community can help me by suggesting an idea so that I don't have to hardcode things into my solution. I also suspect that knn.match_nearest() isn't doing what I need it to do.

And as expected, the recognizer is not at all accurate and very prone to giving misclassification due to rotation, lighting, etc. Any suggestions on improving this approach would be really appreciated.

The database I am using for training is: Karolinska Directed Emotional Faces


i have some doubts on the effectiveness/workability of the described approach. here's a another approach that you might want to consider. the contents of gen folder is @ http://www1.datafilehost.com/d/0f263abc. as you will note when the data size gets bigger(~10k training samples), the size of the model may become unacceptable(~100-200mb). then you will need to look into pca/lda etc.

import cv2
import numpy as np
import os

def feaCnt():
    mat = np.zeros((400,400,3),dtype=np.uint8)
    ret = extr(mat)
    return len(ret)

def extr(img):
    return sobel(img)

def sobel(img):
    gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
    klr = [[-1,0,1],[-2,0,2],[-1,0,1]]
    kbt = [[1,2,1],[0,0,0],[-1,-2,-1]]
    ktb = [[-1,-2,-1],[0,0,0],[1,2,1]]
    krl = [[1,0,-1],[2,0,-2],[1,0,-1]]
    kd1 = [[0,1,2],[-1,0,1],[-2,-1,0]]
    kd2 = [[-2,-1,0],[-1,0,1],[0,1,2]]    
    kd3 = [[0,-1,-2],[1,0,-1],[2,1,0]]
    kd4 = [[2,1,0],[1,0,-1],[0,-1,-2]]
    karr = np.asanyarray([
        klr,
        kbt,
        ktb,
        krl,
        kd1,
        kd2,
        kd3,
        kd4
        ])
    gray=cv2.resize(gray,(40,40))
    res =  np.float32([cv2.resize(cv2.filter2D(gray, -1,k),(15,15)) for k in karr])
    return res.flatten()


root = 'C:/data/gen'

model='c:/data/models/svm/gen.xml'
imgs = []
idx =0
for path, subdirs, files in os.walk(root):
  for name in files:  
    p =path[len(root):].split('')
    p.remove('')
    lbl = p[0]
    fpath = os.path.join(path, name)
    imgs.append((fpath,int(lbl)))
    idx+=1

samples = np.zeros((len(imgs),feaCnt()),dtype = np.float32)
labels = np.zeros(len(imgs),dtype = np.float32)

i=0.
for f,l in imgs:
  print i
  img = cv2.imread(f)
  samples[i]=extr(img)
  labels[i]=l
  i+=1

svm = cv2.SVM()
svmparams = dict( kernel_type = cv2.SVM_POLY, 
                       svm_type = cv2.SVM_C_SVC,
                       degree=3.43,
                       gamma=1.5e-4,
                       coef0=1e-1,
                       )
print 'svm train'
svm.train(samples,labels,params=svmparams)
svm.save(model)
print 'done'

result = np.float32( [(svm.predict(s)) for s in samples])
correct=0.
total=0.

for i,j in zip(result,labels):
    total+=1
    if i==j:
      correct+=1
    print '%f'%(correct/total)

Previously, I was struggling to find the technical difference between ORB, SIFT, SURF etc. and I found these SO posts helpful:

  • https://stackoverflow.com/a/10169025/1463143
  • Are there any fast alternatives to SURF and SIFT for scale-invariant feature extraction?
  • How Does OpenCV ORB Feature Detector Work?
  • The most important thing to note is that these feature detection algorithms in opencv require a single channel (typically 8 bit) grayscale image.

    It turns out that knn.train() can only accept 'array' with data type as '32 bit floating-point'. I believe SVM training in opencv also has this requirement. In python, numpy array need to have the same type of data in each row and all the rows need to be of the same shape unlike python lists which can have data of any type and size.

    So after growing a list of the descriptors I converted the list to an array.

    But! Before that, I hard coded the ORB nfeatures parameter to 25. All my training data images are of roughly the same resolution and I was able to manually verify that each image could produce at least 25 keypoints using ORB. Each keypoint has 32 descriptors so 25*32 gives 800 descriptors for each face image. ORB returns an array whose elements are integer type, with number of rows equal to number of keypoints. I reshaped this into a single row of descriptors to produce a 'vector' of size 800.

    The next challenge was in using knn.find_nearest() . It requires a 'matrix' whose rows are identical in shape to the rows of the ndarray given to knn.train() . Not doing so can produce an error:

    OpenCV Error: Bad argument (Input samples must be floating-point matrix (<num_samples>x<var_count>)) in find_nearest
    

    Even if you have a single vector that needs to be passed to knn.find_nearest() it needs to be in the shape 1xm where m is the number of elements in the vector.

    So I had to hack up a crude way to check that the image taken by my webcam was usable within my hardcoded approach to the problem.

    The code looks like this now:

    import cv2
    import numpy as np
    
    def chooseCascade():
        # TODO: Option for diferent cascades
        # HAAR Classifier for frontal face
        _cascade = cv2.CascadeClassifier('haarcascade_frontalface_alt.xml')
        return _cascade
    
    def cropToObj(cascade,imageFile,flag):
        if flag == 0:
            # Load as 1-channel grayscale image
            image = cv2.imread(imageFile,0)
        elif flag == 1:
            # Load as 3-channel color image
            image = cv2.imread(imageFile,1)
        elif flag == -1: 
            # Load image as is 
            image = cv2.imread(imageFile,-1)
        elif flag == 2:
            # Image is from camera
            image = imageFile
        else:
            print 'improper arguments passed to cropToObj'
    
        # Crop to the object of interest in the image
        objRegion = cascade.detectMultiScale(image) # TODO: What if multiple ojbects in image?
    
        x1 = objRegion[0,0]
        y1 = objRegion[0,1]
        x1PlusWidth = objRegion[0,0]+objRegion[0,2]
        y1PlusHeight = objRegion[0,1]+objRegion[0,3]
    
        objImage = image[y1:y1PlusHeight,x1:x1PlusWidth]
    
        return objImage
    
    def recognizer(fileNames):
        # ORB contructor
        orb = cv2.ORB(nfeatures=25)
    
        keyPoints = []
        descriptors = [] 
    
        # A cascade for face detection
        haarFaceCascade = chooseCascade()
    
        # Start processing images
        for imageFile in fileNames:
            # Find faces using the HAAR cascade
            faceImage = cropToObj(haarFaceCascade,imageFile,flag)
    
            # Extract keypoints and description 
            faceKeyPoints, faceDescriptors = orb.detectAndCompute(faceImage, mask = None)
    
            #print faceDescriptors.shape
            descRow = faceDescriptors.shape[0]
            descCol = faceDescriptors.shape[1]
    
            flatFaceDescriptors = faceDescriptors.reshape(descRow*descCol)
    
            keyPoints.append(faceKeyPoints)
            descriptors.append(flatFaceDescriptors)
    
        descriptors = np.asarray(descriptors, dtype=np.float32)
    
        # KNN model and training on descriptors
        responses = []
        for name in fileNames:
            if name.startswith('BF'):
                responses.append(0) # Female
            else:
                responses.append(1) # Male
    
        responses = np.asarray(responses)
    
        knn = cv2.KNearest()
        knnTrainSuccess = knn.train(descriptors,
                                    responses,
                                    isRegression = False) # isRegression = false, implies classification
    
        # Obtain test face image from cam
        capture = cv2.VideoCapture(0)
        closeCamera = -1
        while(closeCamera < 0):
            retval, camImage = capture.read()      
    
            # Find face in camera image
            try:
                testFaceImage = cropToObj(haarFaceCascade, camImage, 2) # TODO: What if multiple faces?
                testFaceImage = cv2.cvtColor(testFaceImage, cv2.COLOR_BGR2GRAY)
            except TypeError:
                print 'check if front face is visible to camera'
                pass
    
            # Keyponts and descriptors of test face image
            testFaceKP, testFaceDesc = orb.detectAndCompute(testFaceImage, mask = None)
            testDescRow = testFaceDesc.shape[0]
            testDescCol = testFaceDesc.shape[1]
            flatTestFaceDesc = testFaceDesc.reshape(1,testDescRow*testDescCol)
            flatTestFaceDesc = np.asarray(flatTestFaceDesc,dtype=np.float32) 
    
            if flatTestFaceDesc.size == 800:
                # Args in knn.find_nearest: testData, neighborhood
                returnedValue, result, neighborResponse, distance = knn.find_nearest(flatTestFaceDesc,5)
                if returnedValue == 0.0:
                    print 'Female'
                else:
                    print 'Male'
            else: 
                print 'insufficient size of image' 
    
            # Display results
            # TODO: Overlay classification text
            cv2.imshow("testImage", camImage)
    
            closeCamera = cv2.waitKey(1)
        cv2.destroyAllWindows()
    
    
    if __name__ == '__main__':
        fileNames = ['BF09NES_gray.jpg', 
                     'BF11NES_gray.jpg', 
                     'BF13NES_gray.jpg', 
                     'BF14NES_gray.jpg', 
                     'BF18NES_gray.jpg', 
                     'BM25NES_gray.jpg', 
                     'BM26NES_gray.jpg', 
                     'BM29NES_gray.jpg', 
                     'BM31NES_gray.jpg', 
                     'BM34NES_gray.jpg']
    
        recognizer(fileNames)
    

    I am still hoping that someone in SO community can help me by suggesting an idea so that I don't have to hardcode things into my solution. I also suspect that knn.match_nearest() isn't doing what I need it to do.

    And as expected, the recognizer is not at all accurate and very prone to giving misclassification due to rotation, lighting, etc. Any suggestions on improving this approach would be really appreciated.

    链接地址: http://www.djcxy.com/p/79580.html

    上一篇: 在opencv中检测罐头或瓶子

    下一篇: Python,OpenCV:使用ORB特性和KNN对性别进行分类