张量流中的多层感知器不像预期的那样工作
我有一个简单的结构,我从一张Siraj Raval的视频中学习了张量流中的单层感知器。 我试图把它扩展到更多的层次,我很困难。
第一个示例是2个输入和2个输出,其中施加一次权重和偏差,然后将softmax函数应用于输出。
第二个例子是2个输入和2个输出,其间有一个隐藏层(2个单元),所以有两组权重和偏差,并且在每个输入之后应用softmax函数。
我试图将简单情况扩展到N隐藏层的情况,但由于当我添加额外的图层时它们的成功有限,它们似乎被优化器忽略。
输入的形式是:
inputX = np.array([[ 2.10400000e+03, 3.00000000e+00],
[ 1.60000000e+03, 3.00000000e+00],
[ 2.40000000e+03, 3.00000000e+00],
[ 1.41600000e+03, 2.00000000e+00],
[ 3.00000000e+03, 4.00000000e+00],
[ 1.98500000e+03, 4.00000000e+00],
[ 1.53400000e+03, 3.00000000e+00],
[ 1.42700000e+03, 3.00000000e+00],
[ 1.38000000e+03, 3.00000000e+00],
[ 1.49400000e+03, 3.00000000e+00]])
输出标签的格式如下:
inputY = np.array([[1, 0],
[1, 0],
[1, 0],
[0, 1],
[0, 1],
[1, 0],
[0, 1],
[1, 0],
[1, 0],
[1, 0]])
我的代码片段正确执行(依赖关系是numpy和tensorflow):
#input and output placeholder, feed data to x, feed labels to y_
x = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None, 2])
#first layer weights and biases
W = tf.Variable(tf.zeros([2,2]))
b = tf.Variable(tf.zeros([2]))
# vector form of x*W + b
y_values = tf.add(tf.matmul(x, W), b)
#activation function
y = tf.nn.softmax(y_values)
cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(training_epochs):
sess.run(optimizer, feed_dict = {x: inputX, y_:inputY})
#log training
if i % display_step == 0:
cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY})
print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY})
print("Training cost = ", training_cost, "nW=", sess.run(W), "nb=", sess.run(b))
#check what it thinks when you give it the input data
print(sess.run(y, feed_dict = {x:inputX}))
我得到的输出是:
W= [[ 0.00021142 -0.00021142]
[ 0.00120122 -0.00120122]]
b= [ 0.00103542 -0.00103542]
label_predictions = [[ 0.71073025 0.28926972]
[ 0.66503692 0.33496314]
[ 0.73576927 0.2642307 ]
[ 0.64694035 0.35305965]
[ 0.78248388 0.21751612]
[ 0.70078063 0.2992194 ]
[ 0.65879178 0.34120819]
[ 0.6485498 0.3514502 ]
[ 0.64400673 0.3559933 ]
[ 0.65497971 0.34502029]]
不是很好,所以我想尝试增加图层的数量,看看它是否会改善事情。
我通过使用W2,b2和hidden_layer的新变量添加了一个额外的图层:
#input and output placeholder, feed data to x, feed labels to y_
x = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None, 2])
#first layer weights and biases
W = tf.Variable(tf.zeros([2,2]))
b = tf.Variable(tf.zeros([2]))
#second layer weights and biases
W2 = tf.Variable(tf.zeros([2,2]))
b2 = tf.Variable(tf.zeros([2]))
#flow through first layer
hidden_layer = tf.add(tf.matmul(x, W), b)
hidden_layer = tf.nn.softmax(hidden_layer)
#flow through second layer
y_values = tf.add(tf.matmul(hidden_layer, W2), b2)
y = tf.nn.softmax(y_values)
cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(training_epochs):
sess.run(optimizer, feed_dict = {x: inputX, y_:inputY})
#log training
if i % display_step == 0:
cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY})
print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY})
print("Training cost = ", training_cost, "nW=", sess.run(W), "nW2=", sess.run(W2),
"nb=", sess.run(b), "nb2=", sess.run(b2))
#check what it thinks when you give it the input data
print(sess.run(y, feed_dict = {x:inputX}))
然后我被告知,我的第一层权重和偏差全部为零,并且每个训练示例中的预测大致为大约一半,比以前差得多。
输出:
W= [[ 0. 0.]
[ 0. 0.]]
W2= [[ 0.00199614 -0.00199614]
[ 0.00199614 -0.00199614]]
b= [ 0. 0.]
b2= [ 0.00199614 -0.00199614]
label_predictions = [[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]
[ 0.5019961 0.49800384]]
为什么只有一层权重和偏见会受到影响? 为什么不添加改善模型的图层?
为了改善模型的性能,我有几点建议:
1.)随机初始化的变量通常比零更好,至少对于矩阵元素来说。 你可以尝试正态分布的变量。
2.)你应该规范你的输入数据,因为两列的数量级不同。 原则上,这不应该是一个问题,因为权重可以进行不同的调整,但随机初始化可能会使网络只关注第一列。 如果您对数据进行归一化处理,则两列的数量级都是相同的。
3.)也许你应该增加隐藏层中神经元的数量到10左右。
通过这些修改,它对我来说工作得非常好。 我已经发布了一个完整的工作示例如下:
import tensorflow as tf
import numpy as np
alpha = 0.02
training_epochs = 20000
display_step = 2000
inputX = np.array([[ 2.10400000e+03, 3.00000000e+00],
[ 1.60000000e+03, 3.00000000e+00],
[ 2.40000000e+03, 3.00000000e+00],
[ 1.41600000e+03, 2.00000000e+00],
[ 3.00000000e+03, 4.00000000e+00],
[ 1.98500000e+03, 4.00000000e+00],
[ 1.53400000e+03, 3.00000000e+00],
[ 1.42700000e+03, 3.00000000e+00],
[ 1.38000000e+03, 3.00000000e+00],
[ 1.49400000e+03, 3.00000000e+00]])
n_samples = inputX.shape[0]
# Normalize input data
means = np.mean(inputX, axis=0)
stddevs = np.std(inputX, axis=0)
inputX[:,0] = (inputX[:,0] - means[0]) / stddevs[0]
inputX[:,1] = (inputX[:,1] - means[1]) / stddevs[1]
# Define target labels
inputY = np.array([[1, 0],
[1, 0],
[1, 0],
[0, 1],
[0, 1],
[1, 0],
[0, 1],
[1, 0],
[1, 0],
[1, 0]])
#input and output placeholder, feed data to x, feed labels to y_
x = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None, 2])
#first layer weights and biases
W = tf.Variable(tf.random_normal([2,10], stddev=1.0/tf.sqrt(2.0)))
b = tf.Variable(tf.zeros([10]))
#second layer weights and biases
W2 = tf.Variable(tf.random_normal([10,2], stddev=1.0/tf.sqrt(2.0)))
b2 = tf.Variable(tf.zeros([2]))
#flow through first layer
hidden_layer = tf.add(tf.matmul(x, W), b)
hidden_layer = tf.nn.softmax(hidden_layer)
#flow through second layer
y_values = tf.add(tf.matmul(hidden_layer, W2), b2)
y = tf.nn.softmax(y_values)
cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(training_epochs):
sess.run(optimizer, feed_dict = {x: inputX, y_:inputY})
#log training
if i % display_step == 0:
cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY})
#check what it thinks when you give it the input data
print(sess.run(y, feed_dict = {x:inputX}))
print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc))
print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY})
print("Training cost = ", training_cost, "nW=", sess.run(W), "nW2=", sess.run(W2),
"nb=", sess.run(b), "nb2=", sess.run(b2))
输出看起来非常像标签:
[[ 1.00000000e+00 2.48446125e-10]
[ 9.99883890e-01 1.16143732e-04]
[ 1.00000000e+00 2.48440435e-10]
[ 1.65703295e-05 9.99983430e-01]
[ 6.65045518e-05 9.99933481e-01]
[ 9.99985337e-01 1.46147468e-05]
[ 1.69444829e-04 9.99830484e-01]
[ 1.00000000e+00 6.85981003e-12]
[ 1.00000000e+00 2.05180339e-12]
[ 9.99865890e-01 1.34040893e-04]]
链接地址: http://www.djcxy.com/p/5503.html
上一篇: Multi layered perceptron in tensorflow not behaving as expected
下一篇: Loss turns to be NAN at the first round step in my tensorflow CNN