TensorFlow: Neural Network accuracy always 100% on train and test sets
I created a TensorFlow neural network that has 2 hidden layers with 10 units each using ReLU activations and Xavier Initialization for the weights. The output layer has 1 unit outputting binary classification (0 or 1) using the sigmoid activation function to classify whether it believes a passenger on the titanic survived based on the input features.
(The only code omitted is the load_data function which populates the variables X_train, Y_train, X_test, Y_test used later in the program)
Parameters
# Hyperparams
learning_rate = 0.001
lay_dims = [10,10, 1]
# Other params
m = X_train.shape[1]
n_x = X_train.shape[0]
n_y = Y_train.shape[0]
Inputs
X = tf.placeholder(tf.float32, shape=[X_train.shape[0], None], name="X")
norm = tf.nn.l2_normalize(X, 0) # normalize inputs
Y = tf.placeholder(tf.float32, shape=[Y_train.shape[0], None], name="Y")
Initialize Weights & Biases
W1 = tf.get_variable("W1", [lay_dims[0],n_x], initializer=tf.contrib.layers.xavier_initializer())
b1 = tf.get_variable("b1", [lay_dims[0],1], initializer=tf.zeros_initializer())
W2 = tf.get_variable("W2", [lay_dims[1],lay_dims[0]], initializer=tf.contrib.layers.xavier_initializer())
b2 = tf.get_variable("b2", [lay_dims[1],1], initializer=tf.zeros_initializer())
W3 = tf.get_variable("W3", [lay_dims[2],lay_dims[1]], initializer=tf.contrib.layers.xavier_initializer())
b3 = tf.get_variable("b3", [lay_dims[2],1], initializer=tf.zeros_initializer())
Forward Prop
Z1 = tf.add(tf.matmul(W1,X), b1)
A1 = tf.nn.relu(Z1)
Z2 = tf.add(tf.matmul(W2,A1), b2)
A2 = tf.nn.relu(Z2)
Y_hat = tf.add(tf.matmul(W3,A2), b3)
BackProp
cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=tf.transpose(Y_hat), labels=tf.transpose(Y)))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
Session
# Initialize
init = tf.global_variables_initializer()
with tf.Session() as sess:
# Initialize
sess.run(init)
# Normalize Inputs
sess.run(norm, feed_dict={X:X_train, Y:Y_train})
# Forward/Backprob and update weights
for i in range(10000):
c, _ = sess.run([cost, optimizer], feed_dict={X:X_train, Y:Y_train})
if i % 100 == 0:
print(c)
correct_prediction = tf.equal(tf.argmax(Y_hat), tf.argmax(Y))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print("Training Set:", sess.run(accuracy, feed_dict={X: X_train, Y: Y_train}))
print("Testing Set:", sess.run(accuracy, feed_dict={X: X_test, Y: Y_test}))
After running running 10,000 epochs of training, the cost goes down each time so it shows that the learning_rate is okay and that the cost function appears normal. However, after training, all of my Y_hat values (predictions on the training set) are 1 (predicting the passenger survived). So basically the prediction just outputs y=1 for every training example.
Also, when I run tf.argmax on Y_hat, the result is a matrix of all 0's. The same thing is happening when tf.argmax is applied to Y (ground truth labels) which is odd because Y consists of all the correct labels for the training examples.
Any help is greatly appreciated. Thanks.
I assume your Y_hat is a (1,m) matrix with m is the number of training example. Then the tf.argmax(Y_hat)
will give all 0. According to tensorflow documentation, argmax
Returns the index with the largest value across axes of a tensor.
If you do not pass in axis, the axis is set as 0. Because the axis 0 only has one value, the returned index becomes 0 all the time.
I know I am late but I'd would also point out that since your label matrix is of shape (n,1), ie, there is only 1 class to predict, and hence, cross entropy doesn't make sense. In such cases you should use something different for calculating the cost (may be a mean squared error or something similar). I had similar problem recently while I was working on my college project and I found a work around, I turned this binary output into 2 classes such as present and absent so if it's present it's [1,0]. I know this is not the best way to do it but it can be helpful when you need the working thing instantly.
链接地址: http://www.djcxy.com/p/5512.html上一篇: 最小化(成本)操作需要很长时间