Multi layered perceptron in tensorflow not behaving as expected
I have a simple structure that I learned from a video from Siraj Raval of a single layer perceptron in tensorflow. I was trying to extend it to a larger number of layers and I am having difficulty.
The first example is 2 inputs and 2 outputs, where weights and biases are applied once and then the softmax function is applied to the output.
The second example is 2 inputs and 2 outputs with a hidden layer (2 units) in between, so there are two sets of weights and biases and the softmax function is applied after each of them.
I'm trying to extend the simple case to an N-hidden layer case, but am having limited success as when I add extra layers, they seem to be ignored by the optimizer.
Input is of the form:
inputX = np.array([[ 2.10400000e+03, 3.00000000e+00],
[ 1.60000000e+03, 3.00000000e+00],
[ 2.40000000e+03, 3.00000000e+00],
[ 1.41600000e+03, 2.00000000e+00],
[ 3.00000000e+03, 4.00000000e+00],
[ 1.98500000e+03, 4.00000000e+00],
[ 1.53400000e+03, 3.00000000e+00],
[ 1.42700000e+03, 3.00000000e+00],
[ 1.38000000e+03, 3.00000000e+00],
[ 1.49400000e+03, 3.00000000e+00]])
And output labels are of the form:
inputY = np.array([[1, 0],
[1, 0],
[1, 0],
[0, 1],
[0, 1],
[1, 0],
[0, 1],
[1, 0],
[1, 0],
[1, 0]])
A snippet of my code which executes correctly (dependencies are numpy and tensorflow):
#input and output placeholder, feed data to x, feed labels to y_
x = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None, 2])
#first layer weights and biases
W = tf.Variable(tf.zeros([2,2]))
b = tf.Variable(tf.zeros([2]))
# vector form of x*W + b
y_values = tf.add(tf.matmul(x, W), b)
#activation function
y = tf.nn.softmax(y_values)
cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(training_epochs):
sess.run(optimizer, feed_dict = {x: inputX, y_:inputY})
#log training
if i % display_step == 0:
cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY})
print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY})
print("Training cost = ", training_cost, "nW=", sess.run(W), "nb=", sess.run(b))
#check what it thinks when you give it the input data
print(sess.run(y, feed_dict = {x:inputX}))
I get the output of:
W= [[ 0.00021142 -0.00021142]
[ 0.00120122 -0.00120122]]
b= [ 0.00103542 -0.00103542]
label_predictions = [[ 0.71073025 0.28926972]
[ 0.66503692 0.33496314]
[ 0.73576927 0.2642307 ]
[ 0.64694035 0.35305965]
[ 0.78248388 0.21751612]
[ 0.70078063 0.2992194 ]
[ 0.65879178 0.34120819]
[ 0.6485498 0.3514502 ]
[ 0.64400673 0.3559933 ]
[ 0.65497971 0.34502029]]
Not great, so I wanted to try to increase the number of layers to see if it would improve things.
I added an extra layer by using new variables of W2, b2 and hidden_layer:
#input and output placeholder, feed data to x, feed labels to y_
x = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None, 2])
#first layer weights and biases
W = tf.Variable(tf.zeros([2,2]))
b = tf.Variable(tf.zeros([2]))
#second layer weights and biases
W2 = tf.Variable(tf.zeros([2,2]))
b2 = tf.Variable(tf.zeros([2]))
#flow through first layer
hidden_layer = tf.add(tf.matmul(x, W), b)
hidden_layer = tf.nn.softmax(hidden_layer)
#flow through second layer
y_values = tf.add(tf.matmul(hidden_layer, W2), b2)
y = tf.nn.softmax(y_values)
cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(training_epochs):
sess.run(optimizer, feed_dict = {x: inputX, y_:inputY})
#log training
if i % display_step == 0:
cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY})
print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY})
print("Training cost = ", training_cost, "nW=", sess.run(W), "nW2=", sess.run(W2),
"nb=", sess.run(b), "nb2=", sess.run(b2))
#check what it thinks when you give it the input data
print(sess.run(y, feed_dict = {x:inputX}))
I'm then told that my first layer weights and biases are all zeros and that the predictions are now roughly about half and half on every training example, much worse than before.
output:
W= [[ 0. 0.]
[ 0. 0.]]
W2= [[ 0.00199614 -0.00199614]
[ 0.00199614 -0.00199614]]
b= [ 0. 0.]
b2= [ 0.00199614 -0.00199614]
label_predictions = [[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]]
Why is only one layer of weights and biases being affected? Why isn't adding a layer improving the model?
I have a few suggestions in order to improve the performance of your model:
1.) Randomly initialized variables often work better than zeros, at least for the matrix elements. You could try normally distributed variables.
2.) You should normalize your input data, since the two columns are of different order of magnitude. In principle, this should not be a problem, since the weights can be adjusted differently, but with random initialization it is probable that the network will pay attention only to the first column. If you normalize the data, both columns will be of the same order of magnitude.
3.) Maybe you should increase the number of neurons in the hidden layer to a value of about 10.
With these modifications, it worked quite well for me. I've posted a complete working example below:
import tensorflow as tf
import numpy as np
alpha = 0.02
training_epochs = 20000
display_step = 2000
inputX = np.array([[ 2.10400000e+03, 3.00000000e+00],
[ 1.60000000e+03, 3.00000000e+00],
[ 2.40000000e+03, 3.00000000e+00],
[ 1.41600000e+03, 2.00000000e+00],
[ 3.00000000e+03, 4.00000000e+00],
[ 1.98500000e+03, 4.00000000e+00],
[ 1.53400000e+03, 3.00000000e+00],
[ 1.42700000e+03, 3.00000000e+00],
[ 1.38000000e+03, 3.00000000e+00],
[ 1.49400000e+03, 3.00000000e+00]])
n_samples = inputX.shape[0]
# Normalize input data
means = np.mean(inputX, axis=0)
stddevs = np.std(inputX, axis=0)
inputX[:,0] = (inputX[:,0] - means[0]) / stddevs[0]
inputX[:,1] = (inputX[:,1] - means[1]) / stddevs[1]
# Define target labels
inputY = np.array([[1, 0],
[1, 0],
[1, 0],
[0, 1],
[0, 1],
[1, 0],
[0, 1],
[1, 0],
[1, 0],
[1, 0]])
#input and output placeholder, feed data to x, feed labels to y_
x = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None, 2])
#first layer weights and biases
W = tf.Variable(tf.random_normal([2,10], stddev=1.0/tf.sqrt(2.0)))
b = tf.Variable(tf.zeros([10]))
#second layer weights and biases
W2 = tf.Variable(tf.random_normal([10,2], stddev=1.0/tf.sqrt(2.0)))
b2 = tf.Variable(tf.zeros([2]))
#flow through first layer
hidden_layer = tf.add(tf.matmul(x, W), b)
hidden_layer = tf.nn.softmax(hidden_layer)
#flow through second layer
y_values = tf.add(tf.matmul(hidden_layer, W2), b2)
y = tf.nn.softmax(y_values)
cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(training_epochs):
sess.run(optimizer, feed_dict = {x: inputX, y_:inputY})
#log training
if i % display_step == 0:
cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY})
#check what it thinks when you give it the input data
print(sess.run(y, feed_dict = {x:inputX}))
print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY})
print("Training cost = ", training_cost, "nW=", sess.run(W), "nW2=", sess.run(W2),
"nb=", sess.run(b), "nb2=", sess.run(b2))
The output looks very much like the labels:
[[ 1.00000000e+00 2.48446125e-10]
[ 9.99883890e-01 1.16143732e-04]
[ 1.00000000e+00 2.48440435e-10]
[ 1.65703295e-05 9.99983430e-01]
[ 6.65045518e-05 9.99933481e-01]
[ 9.99985337e-01 1.46147468e-05]
[ 1.69444829e-04 9.99830484e-01]
[ 1.00000000e+00 6.85981003e-12]
[ 1.00000000e+00 2.05180339e-12]
[ 9.99865890e-01 1.34040893e-04]]
链接地址: http://www.djcxy.com/p/5504.html
下一篇: 张量流中的多层感知器不像预期的那样工作