Captcha用convnet识别，如何定义损失函数

2018-06-11 03:43:31

我有一个小型的研究项目，我尝试解码一些验证码图像。我使用Tensorflow 0.9中实现的convnet，基于MNIST示例（https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/convolutional_network.py）

我的代码可以在github https://github.com/ksopyla/decapcha/blob/master/decaptcha_convnet.py

我试图重现所描述的想法：

“使用深度卷积神经网络从街景图像中识别多位数字”Goodfellow at al（https://arxiv.org/pdf/1312.6082.pdf）

“CAPTCHA对主动深度学习的认可”Stark at al（https://vision.in.tum.de/_media/spezial/bib/stark-gcpr15.pdf）

其中特定的字符序列被编码为一个二进制向量。在我的情况下，验证码包含最多20个拉丁字符，每个字符编码为63个暗淡二进制向量，其中1位设置在位置，根据：

数字'0-9' - 1位置0-9

大字母'AZ' - 1位置10-35

小写字母'az' - 1位36-61

位置62被保留为空白字符'（短于20个字符的字符被填充为''，最多20个字符）

所以最后当我连接所有20个字符时，我会得到20 * 63昏暗的矢量，这是我的网络应该学习的。我的主要问题是如何为优化器定义适当的损失函数。

我的网络架构：

conv 3x3x32 - > relu - > pooling（k = 2） - > dropout

conv 3x3x64 - > relu - > pooling（k = 2） - > dropout

FC 1024 - > relu - >丢失

输出20 * 63 -

所以我的主要问题是如何为优化器定义损失以及如何评估模型。我尝试过这样的事情

# Construct model
pred = conv_net(x, weights, biases, keep_prob)

# Define loss and optimizer

#split prediction for each char it takes 63 continous postions, we have 20 chars
split_pred = tf.split(1,20,pred)
split_y = tf.split(1,20,y)


#compute partial softmax cost, for each char
costs = list()
for i in range(20):  
   costs.append(tf.nn.softmax_cross_entropy_with_logits(split_pred[i],split_y[i]))

#reduce cost for each char
rcosts = list()
for i in range(20):
    rcosts.append(tf.reduce_mean(costs[i]))

# global reduce    
loss = tf.reduce_sum(rcosts)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)


# Evaluate model

# pred are in format batch_size,20*63, reshape it in order to have each     character prediction
# in row, then take argmax of each row (across columns) then check if it is     equal 
# original label max indexes
# then sum all good results and compute mean (accuracy)

#batch, rows, cols
p = tf.reshape(pred,[batch_size,20,63])
#max idx acros the rows
#max_idx_p=tf.argmax(p,2).eval()
max_idx_p=tf.argmax(p,2)

l = tf.reshape(y,[batch_size,20,63])
#max idx acros the rows
#max_idx_l=tf.argmax(l,2).eval()
max_idx_l=tf.argmax(l,2)

correct_pred = tf.equal(max_idx_p,max_idx_l)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))enter code         here

我尝试从输出中分离每个字符，并分别为每个字符做softmax和cross_entropy，然后合并所有成本。但我已经将tensorflow函数与普通的python列表混合在一起，我可以这样做吗？张量流引擎会理解这一点吗？我可以使用哪种tensorflow函数来代替python列表？

精度以类似的方式计算，输出重新整形为20x63，我从每一行取得argmax，而不是与真正的编码字符进行比较。

当我运行这个损失函数正在减少，但准确度上升然后下降。此图显示了它的外观https://plon.io/files/57a0a7fb4bb1210001ca0476 loss_function

我会很感激任何进一步的评论，我犯的错误或想法实施。

真正的问题是我的网络卡住了，网络输出对于任何输入都是不变的。

当我将损失函数改为loss = tf.nn.sigmoid_cross_entropy_with_logits(pred,y)并对输入进行归一化时，网络开始学习模式。

标准化（减去标准差和标准差）有很大帮助，

Xdata是矩阵[N，D]

x_mean = Xdata.mean(axis=0) 
x_std = Xdata.std(axis=0) 
X = (Xdata-x_mean)/(x_std+0.00001)

数据预处理是关键，值得一读http://cs231n.github.io/neural-networks-2/#data-preprocessing

链接地址: http://www.djcxy.com/p/32041.html

上一篇: Captcha recognizing with convnet, how to define loss function

下一篇: Tensorflow NaN bug?