Captcha recognizing with convnet, how to define loss function
I have small research project where I try to decode some captcha images. I use convnet implemented in Tensorflow 0.9, based on MNIST example (https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/convolutional_network.py)
My code is available at github https://github.com/ksopyla/decapcha/blob/master/decaptcha_convnet.py
I have try to do reproduce the idea described:
where particular sequence of chars is encoded as one binary vector. In my case the captchas contains max 20 latin chars, each char is encoded as 63 dim binary vector, where 1 bit is set at position, according to:
So finally when I concatenate all 20 chars I get 20*63 dim vector which my network should learn. My main issue is how to define proper loss function for optimizer.
Architecture of my network:
So my main issue is how to define loss for optimizer and how to evaluate the model. I have try something like this
# Construct model
pred = conv_net(x, weights, biases, keep_prob)
# Define loss and optimizer
#split prediction for each char it takes 63 continous postions, we have 20 chars
split_pred = tf.split(1,20,pred)
split_y = tf.split(1,20,y)
#compute partial softmax cost, for each char
costs = list()
for i in range(20):
costs.append(tf.nn.softmax_cross_entropy_with_logits(split_pred[i],split_y[i]))
#reduce cost for each char
rcosts = list()
for i in range(20):
rcosts.append(tf.reduce_mean(costs[i]))
# global reduce
loss = tf.reduce_sum(rcosts)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)
# Evaluate model
# pred are in format batch_size,20*63, reshape it in order to have each character prediction
# in row, then take argmax of each row (across columns) then check if it is equal
# original label max indexes
# then sum all good results and compute mean (accuracy)
#batch, rows, cols
p = tf.reshape(pred,[batch_size,20,63])
#max idx acros the rows
#max_idx_p=tf.argmax(p,2).eval()
max_idx_p=tf.argmax(p,2)
l = tf.reshape(y,[batch_size,20,63])
#max idx acros the rows
#max_idx_l=tf.argmax(l,2).eval()
max_idx_l=tf.argmax(l,2)
correct_pred = tf.equal(max_idx_p,max_idx_l)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))enter code here
I try to split each char from output and do softmax and cross_entropy for each char separatelly, then combine all costs. But I have mixed the tensorflow functions with normal python lists, can I do this? Will tensorflow engine understand this? Which tensorflow functions I can use instead of python lists?
The accuracy is computed in similar manner, the output is reshaped to 20x63 and I take argmax from each row than compare with true encoded char.
When I run this loss function is decreasing, but accuracy rise then fall. This picture shows how it looks https://plon.io/files/57a0a7fb4bb1210001ca0476
I would be grateful for any further comments, mistakes I have made or ideas to implement.
The real problem was that my network get stuck, the network output was constant for any input.
When I have changed loss function to loss = tf.nn.sigmoid_cross_entropy_with_logits(pred,y)
and normalize input, then the net start to learn the patterns.
Standarization (substract mean and divide by std) helps a lot,
Xdata is matrix [N,D]
x_mean = Xdata.mean(axis=0)
x_std = Xdata.std(axis=0)
X = (Xdata-x_mean)/(x_std+0.00001)
Data preprocessing is the key, it is worth to read http://cs231n.github.io/neural-networks-2/#data-preprocessing
链接地址: http://www.djcxy.com/p/32042.html