张量流中的多层感知器不像预期的那样工作

我有一个简单的结构,我从一张Siraj Raval的视频中学习了张量流中的单层感知器。 我试图把它扩展到更多的层次,我很困难。

第一个示例是2个输入和2个输出,其中施加一次权重和偏差,然后将softmax函数应用于输出。

第二个例子是2个输入和2个输出,其间有一个隐藏层(2个单元),所以有两组权重和偏差,并且在每个输入之后应用softmax函数。

我试图将简单情况扩展到N隐藏层的情况,但由于当我添加额外的图层时它们的成功有限,它们似乎被优化器忽略。

输入的形式是:

inputX = np.array([[  2.10400000e+03,   3.00000000e+00],
                   [  1.60000000e+03,   3.00000000e+00],
                   [  2.40000000e+03,   3.00000000e+00],
                   [  1.41600000e+03,   2.00000000e+00],
                   [  3.00000000e+03,   4.00000000e+00],
                   [  1.98500000e+03,   4.00000000e+00],
                   [  1.53400000e+03,   3.00000000e+00],
                   [  1.42700000e+03,   3.00000000e+00],
                   [  1.38000000e+03,   3.00000000e+00],
                   [  1.49400000e+03,   3.00000000e+00]])

输出标签的格式如下:

inputY = np.array([[1, 0],
                   [1, 0],
                   [1, 0],
                   [0, 1],
                   [0, 1],
                   [1, 0],
                   [0, 1],
                   [1, 0],
                   [1, 0],
                   [1, 0]])

我的代码片段正确执行(依赖关系是numpy和tensorflow):

#input and output placeholder, feed data to x, feed labels to y_
x = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None, 2]) 

#first layer weights and biases
W = tf.Variable(tf.zeros([2,2])) 
b = tf.Variable(tf.zeros([2])) 

# vector form of x*W + b
y_values = tf.add(tf.matmul(x, W), b)

#activation function
y = tf.nn.softmax(y_values) 

cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost)

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

for i in range(training_epochs):
    sess.run(optimizer, feed_dict = {x: inputX, y_:inputY})

    #log training
    if i % display_step == 0:
        cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY})

        print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc))

print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY})
print("Training cost = ", training_cost, "nW=", sess.run(W), "nb=", sess.run(b))


#check what it thinks when you give it the input data
print(sess.run(y, feed_dict = {x:inputX}))

我得到的输出是:

W= [[ 0.00021142 -0.00021142]
    [ 0.00120122 -0.00120122]] 

b=  [ 0.00103542 -0.00103542]

label_predictions = [[ 0.71073025  0.28926972]
                     [ 0.66503692  0.33496314]
                     [ 0.73576927  0.2642307 ]
                     [ 0.64694035  0.35305965]
                     [ 0.78248388  0.21751612]
                     [ 0.70078063  0.2992194 ]
                     [ 0.65879178  0.34120819]
                     [ 0.6485498   0.3514502 ]
                     [ 0.64400673  0.3559933 ]
                     [ 0.65497971  0.34502029]]

不是很好,所以我想尝试增加图层的数量,看看它是否会改善事情。

我通过使用W2,b2和hidden_​​layer的新变量添加了一个额外的图层:

#input and output placeholder, feed data to x, feed labels to y_
x = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None, 2]) 

#first layer weights and biases
W = tf.Variable(tf.zeros([2,2])) 
b = tf.Variable(tf.zeros([2])) 

#second layer weights and biases
W2 = tf.Variable(tf.zeros([2,2]))
b2 = tf.Variable(tf.zeros([2]))

#flow through first layer
hidden_layer = tf.add(tf.matmul(x, W), b)
hidden_layer = tf.nn.softmax(hidden_layer)

#flow through second layer
y_values = tf.add(tf.matmul(hidden_layer, W2), b2)
y = tf.nn.softmax(y_values) 

cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost)

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

for i in range(training_epochs):
    sess.run(optimizer, feed_dict = {x: inputX, y_:inputY})

    #log training
    if i % display_step == 0:
        cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY})

        print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc))

print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY})
print("Training cost = ", training_cost, "nW=", sess.run(W), "nW2=", sess.run(W2),
          "nb=", sess.run(b), "nb2=", sess.run(b2))


#check what it thinks when you give it the input data
print(sess.run(y, feed_dict = {x:inputX}))

然后我被告知,我的第一层权重和偏差全部为零,并且每个训练示例中的预测大致为大约一半,比以前差得多。

输出:

 W= [[ 0.  0.]
     [ 0.  0.]] 

W2= [[ 0.00199614 -0.00199614]
     [ 0.00199614 -0.00199614]] 

 b=  [ 0.  0.] 
b2=  [ 0.00199614 -0.00199614]

label_predictions = [[ 0.5019961   0.49800384]
                     [ 0.5019961   0.49800384]
                     [ 0.5019961   0.49800384]
                     [ 0.5019961   0.49800384]
                     [ 0.5019961   0.49800384]
                     [ 0.5019961   0.49800384]
                     [ 0.5019961   0.49800384]
                     [ 0.5019961   0.49800384]
                     [ 0.5019961   0.49800384]
                     [ 0.5019961   0.49800384]]

为什么只有一层权重和偏见会受到影响? 为什么不添加改善模型的图层?


为了改善模型的性能,我有几点建议:

1.)随机初始化的变量通常比零更好,至少对于矩阵元素来说。 你可以尝试正态分布的变量。

2.)你应该规范你的输入数据,因为两列的数量级不同。 原则上,这不应该是一个问题,因为权重可以进行不同的调整,但随机初始化可能会使网络只关注第一列。 如果您对数据进行归一化处理,则两列的数量级都是相同的。

3.)也许你应该增加隐藏层中神经元的数量到10左右。

通过这些修改,它对我来说工作得非常好。 我已经发布了一个完整的工作示例如下:

import tensorflow as tf
import numpy as np
alpha = 0.02
training_epochs = 20000
display_step = 2000
inputX = np.array([[  2.10400000e+03,   3.00000000e+00],
                   [  1.60000000e+03,   3.00000000e+00],
                   [  2.40000000e+03,   3.00000000e+00],
                   [  1.41600000e+03,   2.00000000e+00],
                   [  3.00000000e+03,   4.00000000e+00],
                   [  1.98500000e+03,   4.00000000e+00],
                   [  1.53400000e+03,   3.00000000e+00],
                   [  1.42700000e+03,   3.00000000e+00],
                   [  1.38000000e+03,   3.00000000e+00],
                   [  1.49400000e+03,   3.00000000e+00]])
n_samples = inputX.shape[0]

# Normalize input data
means = np.mean(inputX, axis=0)
stddevs = np.std(inputX, axis=0)
inputX[:,0] = (inputX[:,0] - means[0]) / stddevs[0]
inputX[:,1] = (inputX[:,1] - means[1]) / stddevs[1]

# Define target labels
inputY = np.array([[1, 0],
                   [1, 0],
                   [1, 0],
                   [0, 1],
                   [0, 1],
                   [1, 0],
                   [0, 1],
                   [1, 0],
                   [1, 0],
                   [1, 0]])

#input and output placeholder, feed data to x, feed labels to y_
x = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None, 2]) 

#first layer weights and biases
W = tf.Variable(tf.random_normal([2,10], stddev=1.0/tf.sqrt(2.0))) 
b = tf.Variable(tf.zeros([10])) 

#second layer weights and biases
W2 = tf.Variable(tf.random_normal([10,2], stddev=1.0/tf.sqrt(2.0)))
b2 = tf.Variable(tf.zeros([2]))

#flow through first layer
hidden_layer = tf.add(tf.matmul(x, W), b)
hidden_layer = tf.nn.softmax(hidden_layer)

#flow through second layer
y_values = tf.add(tf.matmul(hidden_layer, W2), b2)
y = tf.nn.softmax(y_values) 

cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(n_samples) #sum of squared errors
optimizer = tf.train.AdamOptimizer(alpha).minimize(cost)

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

for i in range(training_epochs):
    sess.run(optimizer, feed_dict = {x: inputX, y_:inputY})

    #log training
    if i % display_step == 0:
        cc = sess.run(cost, feed_dict = {x: inputX, y_:inputY})
        #check what it thinks when you give it the input data
        print(sess.run(y, feed_dict = {x:inputX}))

        print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc))

print("Optimization Finished!")
training_cost = sess.run(cost, feed_dict = {x: inputX, y_: inputY})
print("Training cost = ", training_cost, "nW=", sess.run(W), "nW2=", sess.run(W2),
          "nb=", sess.run(b), "nb2=", sess.run(b2))

输出看起来非常像标签:

[[  1.00000000e+00   2.48446125e-10]
 [  9.99883890e-01   1.16143732e-04]
 [  1.00000000e+00   2.48440435e-10]
 [  1.65703295e-05   9.99983430e-01]
 [  6.65045518e-05   9.99933481e-01]
 [  9.99985337e-01   1.46147468e-05]
 [  1.69444829e-04   9.99830484e-01]
 [  1.00000000e+00   6.85981003e-12]
 [  1.00000000e+00   2.05180339e-12]
 [  9.99865890e-01   1.34040893e-04]]
链接地址: http://www.djcxy.com/p/5503.html

上一篇: Multi layered perceptron in tensorflow not behaving as expected

下一篇: Loss turns to be NAN at the first round step in my tensorflow CNN