Python, OpenCV: classify gender using ORB features and KNN
Task: Classify images of human faces as female or male. Training images with labels are available, obtain the test image from webcam.
Using: Python 2.7, OpenCV 2.4.4
I am using ORB to extract features from a grayscale image which I hope to use for training a K-Nearest Neighbor classifier. Each training image is of a different person so the number of keypoints and descriptors for each image are obviously different. My problem is that I'm not able to understand the OpenCV docs for KNN and ORB. I've seen other SO questions about ORB, KNN and FLANN but they didn't help much.
What exactly is the nature of the descriptor given by ORB? How is it different than descriptors obtained by BRIEF, SURF, SIFT, etc.?
It seems that the feature descriptors should be of the same size for each training sample in KNN. How do I make sure that the descriptors are of the same size for each image? More generally, in what format should features be presented to KNN for training with given data and labels? Should the data be an int or float? Can it be char?
The training data can be found here.
I am also using the haarcascade_frontalface_alt.xml
from opencv samples
Right now the KNN model is given just 10 images for training to see if my program passes without errors which, it does not.
Here is my code:
import cv2
from numpy import float32 as np.float32
def chooseCascade():
# TODO: Option for diferent cascades
# HAAR Classifier for frontal face
_cascade = cv2.CascadeClassifier('haarcascade_frontalface_alt.xml')
return _cascade
def cropToObj(cascade,imageFile):
# Load as 1-channel grayscale image
image = cv2.imread(imageFile,0)
# Crop to the object of interest in the image
objRegion = cascade.detectMultiScale(image) # TODO: What if multiple ojbects in image?
x1 = objRegion[0,0]
y1 = objRegion[0,1]
x1PlusWidth = objRegion[0,0]+objRegion[0,2]
y1PlusHeight = objRegion[0,1]+objRegion[0,3]
_objImage = image[y1:y1PlusHeight,x1:x1PlusWidth]
return _objImage
def recognizer(fileNames):
# ORB contructor
orb = cv2.ORB(nfeatures=100)
keyPoints = []
descriptors = []
# A cascade for face detection
haarFaceCascade = chooseCascade()
# Start processing images
for imageFile in fileNames:
# Find faces using the HAAR cascade
faceImage = cropToObj(haarFaceCascade,imageFile)
# Extract keypoints and description
faceKeyPoints, faceDescriptors = orb.detectAndCompute(faceImage, mask = None)
#print faceDescriptors.shape
descRow = faceDescriptors.shape[0]
descCol = faceDescriptors.shape[1]
flatFaceDescriptors = faceDescriptors.reshape(descRow*descCol).astype(np.float32)
keyPoints.append(faceKeyPoints)
descriptors.append(flatFaceDescriptors)
print descriptors
# KNN model and training on descriptors
responses = []
for name in fileNames:
if name.startswith('BF'):
responses.append(0) # Female
else:
responses.append(1) # Male
knn = cv2.KNearest()
knnTrainSuccess = knn.train(descriptors,
responses,
isRegression = False) # isRegression = false, implies classification
# Obtain test face image from cam
capture = cv2.VideoCapture(0)
closeCamera = -1
while(closeCamera < 0):
_retval, _camImage = capture.retrieve()
# Find face in camera image
testFaceImage = haarFaceCascade.detectMultiScale(_camImage) # TODO: What if multiple faces?
# Keyponts and descriptors of test face image
testFaceKP, testFaceDesc = orb.detectAndCompute(testFaceImage, mask = None)
testDescRow = testFaceDesc.shape[0]
flatTestFaceDesc = testFaceDesc.reshape(1,testDescRow*testDescCol).astype(np.float32)
# Args in knn.find_nearest: testData, neighborhood
returnedValue, result, neighborResponse, distance = knn.find_nearest(flatTestFaceDesc,3)
print returnedValue, result, neighborResponse, distance
# Display results
# TODO: Overlay classification text
cv2.imshow("testImage", _camImage)
closeCamera = cv2.waitKey(1)
cv2.destroyAllWindows()
if __name__ == '__main__':
fileNames = ['BF09NES_gray.jpg',
'BF11NES_gray.jpg',
'BF13NES_gray.jpg',
'BF14NES_gray.jpg',
'BF18NES_gray.jpg',
'BM25NES_gray.jpg',
'BM26NES_gray.jpg',
'BM29NES_gray.jpg',
'BM31NES_gray.jpg',
'BM34NES_gray.jpg']
recognizer(fileNames)
Currently I am getting an error at the line with knn.train()
where descriptors
is not detected as a numpy array.
Also, is this approach completely wrong? Am I supposed to use some other way for gender classification? I wasn't satisfied with the fisherface and eigenface example in the opencv facerec demo so please don't direct me to those.
Any other help is much appreciated. Thanks.
--- EDIT ---
I've tried a few things and come up with an answer.
I am still hoping that someone in SO community can help me by suggesting an idea so that I don't have to hardcode things into my solution. I also suspect that knn.match_nearest() isn't doing what I need it to do.
And as expected, the recognizer is not at all accurate and very prone to giving misclassification due to rotation, lighting, etc. Any suggestions on improving this approach would be really appreciated.
The database I am using for training is: Karolinska Directed Emotional Faces
i have some doubts on the effectiveness/workability of the described approach. here's a another approach that you might want to consider. the contents of gen
folder is @ http://www1.datafilehost.com/d/0f263abc. as you will note when the data size gets bigger(~10k training samples), the size of the model may become unacceptable(~100-200mb). then you will need to look into pca/lda etc.
import cv2
import numpy as np
import os
def feaCnt():
mat = np.zeros((400,400,3),dtype=np.uint8)
ret = extr(mat)
return len(ret)
def extr(img):
return sobel(img)
def sobel(img):
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
klr = [[-1,0,1],[-2,0,2],[-1,0,1]]
kbt = [[1,2,1],[0,0,0],[-1,-2,-1]]
ktb = [[-1,-2,-1],[0,0,0],[1,2,1]]
krl = [[1,0,-1],[2,0,-2],[1,0,-1]]
kd1 = [[0,1,2],[-1,0,1],[-2,-1,0]]
kd2 = [[-2,-1,0],[-1,0,1],[0,1,2]]
kd3 = [[0,-1,-2],[1,0,-1],[2,1,0]]
kd4 = [[2,1,0],[1,0,-1],[0,-1,-2]]
karr = np.asanyarray([
klr,
kbt,
ktb,
krl,
kd1,
kd2,
kd3,
kd4
])
gray=cv2.resize(gray,(40,40))
res = np.float32([cv2.resize(cv2.filter2D(gray, -1,k),(15,15)) for k in karr])
return res.flatten()
root = 'C:/data/gen'
model='c:/data/models/svm/gen.xml'
imgs = []
idx =0
for path, subdirs, files in os.walk(root):
for name in files:
p =path[len(root):].split('')
p.remove('')
lbl = p[0]
fpath = os.path.join(path, name)
imgs.append((fpath,int(lbl)))
idx+=1
samples = np.zeros((len(imgs),feaCnt()),dtype = np.float32)
labels = np.zeros(len(imgs),dtype = np.float32)
i=0.
for f,l in imgs:
print i
img = cv2.imread(f)
samples[i]=extr(img)
labels[i]=l
i+=1
svm = cv2.SVM()
svmparams = dict( kernel_type = cv2.SVM_POLY,
svm_type = cv2.SVM_C_SVC,
degree=3.43,
gamma=1.5e-4,
coef0=1e-1,
)
print 'svm train'
svm.train(samples,labels,params=svmparams)
svm.save(model)
print 'done'
result = np.float32( [(svm.predict(s)) for s in samples])
correct=0.
total=0.
for i,j in zip(result,labels):
total+=1
if i==j:
correct+=1
print '%f'%(correct/total)
Previously, I was struggling to find the technical difference between ORB, SIFT, SURF etc. and I found these SO posts helpful:
The most important thing to note is that these feature detection algorithms in opencv require a single channel (typically 8 bit) grayscale image.
It turns out that knn.train()
can only accept 'array' with data type as '32 bit floating-point'. I believe SVM training in opencv also has this requirement. In python, numpy array need to have the same type of data in each row and all the rows need to be of the same shape unlike python lists which can have data of any type and size.
So after growing a list of the descriptors I converted the list to an array.
But! Before that, I hard coded the ORB nfeatures
parameter to 25. All my training data images are of roughly the same resolution and I was able to manually verify that each image could produce at least 25 keypoints using ORB. Each keypoint has 32 descriptors so 25*32 gives 800 descriptors for each face image. ORB returns an array whose elements are integer type, with number of rows equal to number of keypoints. I reshaped this into a single row of descriptors to produce a 'vector' of size 800.
The next challenge was in using knn.find_nearest()
. It requires a 'matrix' whose rows are identical in shape to the rows of the ndarray given to knn.train()
. Not doing so can produce an error:
OpenCV Error: Bad argument (Input samples must be floating-point matrix (<num_samples>x<var_count>)) in find_nearest
Even if you have a single vector that needs to be passed to knn.find_nearest()
it needs to be in the shape 1xm where m is the number of elements in the vector.
So I had to hack up a crude way to check that the image taken by my webcam was usable within my hardcoded approach to the problem.
The code looks like this now:
import cv2
import numpy as np
def chooseCascade():
# TODO: Option for diferent cascades
# HAAR Classifier for frontal face
_cascade = cv2.CascadeClassifier('haarcascade_frontalface_alt.xml')
return _cascade
def cropToObj(cascade,imageFile,flag):
if flag == 0:
# Load as 1-channel grayscale image
image = cv2.imread(imageFile,0)
elif flag == 1:
# Load as 3-channel color image
image = cv2.imread(imageFile,1)
elif flag == -1:
# Load image as is
image = cv2.imread(imageFile,-1)
elif flag == 2:
# Image is from camera
image = imageFile
else:
print 'improper arguments passed to cropToObj'
# Crop to the object of interest in the image
objRegion = cascade.detectMultiScale(image) # TODO: What if multiple ojbects in image?
x1 = objRegion[0,0]
y1 = objRegion[0,1]
x1PlusWidth = objRegion[0,0]+objRegion[0,2]
y1PlusHeight = objRegion[0,1]+objRegion[0,3]
objImage = image[y1:y1PlusHeight,x1:x1PlusWidth]
return objImage
def recognizer(fileNames):
# ORB contructor
orb = cv2.ORB(nfeatures=25)
keyPoints = []
descriptors = []
# A cascade for face detection
haarFaceCascade = chooseCascade()
# Start processing images
for imageFile in fileNames:
# Find faces using the HAAR cascade
faceImage = cropToObj(haarFaceCascade,imageFile,flag)
# Extract keypoints and description
faceKeyPoints, faceDescriptors = orb.detectAndCompute(faceImage, mask = None)
#print faceDescriptors.shape
descRow = faceDescriptors.shape[0]
descCol = faceDescriptors.shape[1]
flatFaceDescriptors = faceDescriptors.reshape(descRow*descCol)
keyPoints.append(faceKeyPoints)
descriptors.append(flatFaceDescriptors)
descriptors = np.asarray(descriptors, dtype=np.float32)
# KNN model and training on descriptors
responses = []
for name in fileNames:
if name.startswith('BF'):
responses.append(0) # Female
else:
responses.append(1) # Male
responses = np.asarray(responses)
knn = cv2.KNearest()
knnTrainSuccess = knn.train(descriptors,
responses,
isRegression = False) # isRegression = false, implies classification
# Obtain test face image from cam
capture = cv2.VideoCapture(0)
closeCamera = -1
while(closeCamera < 0):
retval, camImage = capture.read()
# Find face in camera image
try:
testFaceImage = cropToObj(haarFaceCascade, camImage, 2) # TODO: What if multiple faces?
testFaceImage = cv2.cvtColor(testFaceImage, cv2.COLOR_BGR2GRAY)
except TypeError:
print 'check if front face is visible to camera'
pass
# Keyponts and descriptors of test face image
testFaceKP, testFaceDesc = orb.detectAndCompute(testFaceImage, mask = None)
testDescRow = testFaceDesc.shape[0]
testDescCol = testFaceDesc.shape[1]
flatTestFaceDesc = testFaceDesc.reshape(1,testDescRow*testDescCol)
flatTestFaceDesc = np.asarray(flatTestFaceDesc,dtype=np.float32)
if flatTestFaceDesc.size == 800:
# Args in knn.find_nearest: testData, neighborhood
returnedValue, result, neighborResponse, distance = knn.find_nearest(flatTestFaceDesc,5)
if returnedValue == 0.0:
print 'Female'
else:
print 'Male'
else:
print 'insufficient size of image'
# Display results
# TODO: Overlay classification text
cv2.imshow("testImage", camImage)
closeCamera = cv2.waitKey(1)
cv2.destroyAllWindows()
if __name__ == '__main__':
fileNames = ['BF09NES_gray.jpg',
'BF11NES_gray.jpg',
'BF13NES_gray.jpg',
'BF14NES_gray.jpg',
'BF18NES_gray.jpg',
'BM25NES_gray.jpg',
'BM26NES_gray.jpg',
'BM29NES_gray.jpg',
'BM31NES_gray.jpg',
'BM34NES_gray.jpg']
recognizer(fileNames)
I am still hoping that someone in SO community can help me by suggesting an idea so that I don't have to hardcode things into my solution. I also suspect that knn.match_nearest() isn't doing what I need it to do.
And as expected, the recognizer is not at all accurate and very prone to giving misclassification due to rotation, lighting, etc. Any suggestions on improving this approach would be really appreciated.
链接地址: http://www.djcxy.com/p/79580.html上一篇: 在opencv中检测罐头或瓶子