Captcha recognizing with convnet, how to define loss function

2018-06-11 03:43:32

I have small research project where I try to decode some captcha images. I use convnet implemented in Tensorflow 0.9, based on MNIST example (https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/convolutional_network.py)

My code is available at github https://github.com/ksopyla/decapcha/blob/master/decaptcha_convnet.py

I have try to do reproduce the idea described:

"Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks" Goodfellow at al (https://arxiv.org/pdf/1312.6082.pdf)

"CAPTCHA Recognition with Active Deep Learning" Stark at al (https://vision.in.tum.de/_media/spezial/bib/stark-gcpr15.pdf)

where particular sequence of chars is encoded as one binary vector. In my case the captchas contains max 20 latin chars, each char is encoded as 63 dim binary vector, where 1 bit is set at position, according to:

digits '0-9' - 1 at position 0- 9

big letters 'AZ' - 1 at position 10-35

small letters 'az' - 1 atposition 36-61

position 62 is reserved for blank char '' (words shorter then 20 chars are filled with '' up to 20)

So finally when I concatenate all 20 chars I get 20*63 dim vector which my network should learn. My main issue is how to define proper loss function for optimizer.

Architecture of my network:

conv 3x3x32 ->relu -> pooling(k=2) ->dropout

conv 3x3x64 ->relu -> pooling(k=2) ->dropout

FC 1024 ->relu -> dropout

Output 20*63 -

So my main issue is how to define loss for optimizer and how to evaluate the model. I have try something like this

# Construct model
pred = conv_net(x, weights, biases, keep_prob)

# Define loss and optimizer

#split prediction for each char it takes 63 continous postions, we have 20 chars
split_pred = tf.split(1,20,pred)
split_y = tf.split(1,20,y)


#compute partial softmax cost, for each char
costs = list()
for i in range(20):  
   costs.append(tf.nn.softmax_cross_entropy_with_logits(split_pred[i],split_y[i]))

#reduce cost for each char
rcosts = list()
for i in range(20):
    rcosts.append(tf.reduce_mean(costs[i]))

# global reduce    
loss = tf.reduce_sum(rcosts)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)


# Evaluate model

# pred are in format batch_size,20*63, reshape it in order to have each     character prediction
# in row, then take argmax of each row (across columns) then check if it is     equal 
# original label max indexes
# then sum all good results and compute mean (accuracy)

#batch, rows, cols
p = tf.reshape(pred,[batch_size,20,63])
#max idx acros the rows
#max_idx_p=tf.argmax(p,2).eval()
max_idx_p=tf.argmax(p,2)

l = tf.reshape(y,[batch_size,20,63])
#max idx acros the rows
#max_idx_l=tf.argmax(l,2).eval()
max_idx_l=tf.argmax(l,2)

correct_pred = tf.equal(max_idx_p,max_idx_l)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))enter code         here

I try to split each char from output and do softmax and cross_entropy for each char separatelly, then combine all costs. But I have mixed the tensorflow functions with normal python lists, can I do this? Will tensorflow engine understand this? Which tensorflow functions I can use instead of python lists?

The accuracy is computed in similar manner, the output is reshaped to 20x63 and I take argmax from each row than compare with true encoded char.

When I run this loss function is decreasing, but accuracy rise then fall. This picture shows how it looks https://plon.io/files/57a0a7fb4bb1210001ca0476 loss_function

I would be grateful for any further comments, mistakes I have made or ideas to implement.

The real problem was that my network get stuck, the network output was constant for any input.

When I have changed loss function to loss = tf.nn.sigmoid_cross_entropy_with_logits(pred,y) and normalize input, then the net start to learn the patterns.

Standarization (substract mean and divide by std) helps a lot,

Xdata is matrix [N,D]

x_mean = Xdata.mean(axis=0) 
x_std = Xdata.std(axis=0) 
X = (Xdata-x_mean)/(x_std+0.00001)

Data preprocessing is the key, it is worth to read http://cs231n.github.io/neural-networks-2/#data-preprocessing

链接地址: http://www.djcxy.com/p/32042.html

上一篇: ConvNet没有改进（Tensorflow）

下一篇: Captcha用convnet识别，如何定义损失函数