loss remains constant

2018-06-11 03:49:44

I am just at the beginning of my career in machine learning and wanted to create simple CNN to classify 2 different kind of leaves (belonging to 2 different species of trees). Before gathering huge amount of pictures of leaves, I decided to create very small, simple CNN in Tensorflow and train it on only one image, to check, wheter the code is ok. I normalized the photo of size 256x256(x 3 channels) to <0,1> and created 4 layers (2 conv and 2 dense) network. Unfortunately, loss almost always tended to some constant value (usually some integer) from the very beginning. I thought something was wrong with the picture, so I replaced it with a random numpy array of the same dimensions. Unfortunately, loss was still constant. Sometimes the net seemed to learn, because loss was decreasing, but most times was constant from the very beginning. Could anyone be of help explaining, why is that so? I read that training with one example is the best way to check wheter your code lacks bugs, but the longer I struggle with it, the less can I see.

Here is my code (based on this TensorFlow tutorial 1). I used exponential linear units, because I thought that my problems were caused by 0 gradient in badly initialized ReLUs.

import matplotlib.pyplot as plt
import numpy as np
from numpy import random
from sklearn import utils
import tensorflow as tf

#original dataset of 6 leaves
# input = [ndimage.imread("E:leavesdab1.jpg"),
#         ndimage.imread("E:leavesdab2.jpg"),
#        ndimage.imread("E:leavesdab3.jpg"),
#        ndimage.imread("E:leavesklon1.jpg"),
#        ndimage.imread("E:leavesklon2.jpg"),
#        ndimage.imread("E:leavesklon3.jpg")]

#normalize each image (originally uint8)
#input=[input/255 for i in range(len(input))

#temporary testing dataset, mimicking 6 images, each 3-channel, of dimension 256x256
input=[random.randn(256,256,3)]
       # random.randn(256, 256, 3),
       # random.randn(256, 256, 3),
       # random.randn(256, 256, 3),
       # random.randn(256, 256, 3),
       # random.randn(256, 256, 3)]

#each image belong to one of two classes
labels=[[1]]#,[1,0],[1,0],[0,1],[0,1],[0,1]]


def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.truncated_normal(shape, stddev=.1)
  return tf.Variable(initial)

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

x = tf.placeholder(tf.float32, shape=[None, 256,256,3])
y_ = tf.placeholder(tf.float32, shape=[None, 1])

x_image = tf.reshape(x, [-1,256,256,3])

#first conv layer
W_conv1 = weight_variable([5,5, 3,8])
b_conv1 = bias_variable([8])
h_conv1 = tf.nn.elu(conv2d(x_image, W_conv1) + b_conv1)

#second conv layer
W_conv2 = weight_variable([5,5, 8,16])
b_conv2 = bias_variable([16])
h_conv2 = tf.nn.elu(conv2d(h_conv1, W_conv2) + b_conv2)

#first dense layer
W_fc1 = weight_variable([256*256*16, 10])
b_fc1 = bias_variable([10])
out_flat = tf.reshape(h_conv2, [-1, 256*256*16])
h_fc1 = tf.nn.elu(tf.matmul(out_flat, W_fc1) + b_fc1)

#second dense layer
W_fc2 = weight_variable([10, 1])
b_fc2 = bias_variable([1])
h_fc2 = tf.nn.elu(tf.matmul(h_fc1, W_fc2) + b_fc2)

#tried also with softmax with logits
cross_entropy=tf.losses.mean_squared_error(predictions=h_fc2, labels=y_)
train_step = tf.train.AdamOptimizer(1e-3).minimize(cross_entropy)

print("h2", h_fc2.shape)
print("y", y_.shape)

sess=tf.Session()
sess.run(tf.global_variables_initializer())
loss = []
for i in range(10):
    sess.run(train_step, feed_dict={x:input, y_:labels})
    input, labels = utils.shuffle(input, labels)
    loss.append(sess.run(cross_entropy, feed_dict={x:input, y_:labels}))
    print(i, " LOSS: ", loss[-1])

np.set_printoptions(precision=3, suppress=True)
for i in range(len(input)):
    print(labels[i], sess.run(h_fc2, feed_dict={x:[input[i]], y_:[labels[i]]}))

plt.plot(loss)
plt.show()

And here a list of what I tried:

The base code above results in loss almost always equal to exactly 4.0

Expanded training time to 100 epochs. It turned out, that the probability of achieving constant loss increased. It is strange, because in my opinion the number of epochs should change anything in the early stage of training.

I changed the number of feature maps to 32 in I layer, 64 in II layer and to 100 neurons in dense layer

Because my output is binary, originally I used only single output. I changed it to excluding 2 outputs. It changed loss to 2.5. It turned out, that my outputs tended to be [-1,-1], while label was [1,0]

I tried various learning rates, from 0.001 to 0.00005

I initialized weights and biases with standard deviation equal to 2 instead of 0.1. Loss seemed to decrease, but achieved high values, like 1e10. So I changed the number of epochs from 10 to 100.. and again, loss is 2.5 from the very beginning. After returning to 10 epochs, loss remained 2.5

I expanded dataset to 6 elements. Loss is the same as previously.

Does anyone has any idea, why does this happen? As far as I know, if the net couldn't generalize, the loss would't decrease and rather increase/oscillate but not remain constant?

I found an answer. The problem was caused by the line:

h_fc2 = tf.nn.elu(tf.matmul(h_fc1, W_fc2) + b_fc2)

I don't know why, but it made outputs to be equal to -1. When I changed it to

h_fc2 = f.matmul(h_fc1, W_fc2) + b_fc2

it worked like a charm and loss began to decrease. Could anyone explain, why should we avoid using an activation function in last layer (I saw the same issue in aforementioned TensorFlow tutorial)? I don't understand it, I thought every layer should have its own activation function?

A few issues I see:

You're using square loss, not cross entropy, for classification use tf.nn.sigmoid_cross_entropy_with_logits(...) , not tf.losses.mean_squared_error

In this code:

#normalize each image (originally uint8)
#input=[input/255 for i in range(len(input))

If your input is uint8, your data is probably being rounded to 0 and you're just sending in blank images, which will converge to one loss as you're experiencing.

Your first debugging step should be to save the images on the line before sess.run . Save the exact image you're sending to the network to validate it. Don't make it complicated, just use scipy to save the image to file and do a sanity check.

Also, you have redundant calls to TF here:

sess.run(train_step, feed_dict={x:input, y_:labels})
input, labels = utils.shuffle(input, labels)
loss.append(sess.run(cross_entropy, feed_dict={x:input, y_:labels}))

replace that with:

result_train_step, result_cross_entropy = sess.run([train_step, cross_entropy], feed_dict={x:input, y_:labels})

Note on learning rate, start with 1e-4 as a good beginning point.

Also, sanity check that your labels match your images correctly, save the labels to a file when you dump the images and sanity check. It's very easy to permute labels.

I also had a very hard time figuring out this problem with one of my own work. Turn out reducing the learning rate helped me to move away from the constant loss.

For your problem, i will suggest something close to 5e-5. Hope the problem will be solved

链接地址: http://www.djcxy.com/p/32054.html

上一篇: 在TensorFlow中加载数据

下一篇: 损失保持不变