how to stack LSTM layers using TensorFlow

2018-06-11 04:27:11

what I have is the following, which I believe is a network with one hidden LSTM layer:

# Parameters
learning rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10

# Network Parameters
n_input = 13
n_steps = 10
n_hidden = 512
n_classes = 13

# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])

# Define weights
weights = {
    'out' : tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
    'out' : tf.Variable(tf.random_normal([n_classes]))
}

However, I am trying to build an LSTM network using TensorFlow to predict power consumption. I have been looking around to find a good example, but I could not find any model with 2 hidden LSTM layers. Here's the model that I would like to build:

1 input layer, 1 output layer, 2 hidden LSTM layers(with 512 neurons in each), time step(sequence length): 10

Could anyone guide me to build this using TensorFlow? ( from defining weights, building input shape, training, predicting, use of optimizer or cost function, etc), any help would be much appreciated.

Thank you so much in advance!

Here is how I do it in a translation model with GRU cells. You can just replace the GRU with an LSTM. It is really easy just use tf.nn.rnn_cell.MultiRNNCell with a list of the multiple cells it should wrap. In the code bellow I am manually unrolling it but you can pass it to tf.nn.dynamic_rnn or tf.nn.rnn as well.

y = input_tensor
with tf.variable_scope('encoder') as scope:
    rnn_cell = rnn.MultiRNNCell([rnn.GRUCell(1024) for _ in range(3)])
    state = tf.zeros((BATCH_SIZE, rnn_cell.state_size))
    output = [None] * TIME_STEPS
    for t in reversed(range(TIME_STEPS)):
        y_t = tf.reshape(y[:, t, :], (BATCH_SIZE, -1))
        output[t], state = rnn_cell(y_t, state)
        scope.reuse_variables()
    y = tf.pack(output, 1)

First you need some placeholders to put your training data (one batch)

x_input = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
y_output = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])

A LSTM need a state, which consists of two components, the hidden state and the cell state, very good guide here: https://arxiv.org/pdf/1506.00019.pdf. For every layer in the LSTM you have one cell state and one hidden state.

The problem is that Tensorflow stores this in a LSTMStateTuple which you can not send into placeholder. So you need to store it in a Tensor, and then unpack it into a tuple:

state_placeholder = tf.placeholder(tf.float32, [num_layers, 2, batch_size, state_size])

l = tf.unpack(state_placeholder, axis=0)
rnn_tuple_state = tuple(
    [tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1])
     for idx in range(num_layers)]
)

Then you can use the built-in Tensorflow API to create the stacked LSTM layer.

cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers, state_is_tuple=True)
outputs, state = tf.nn.dynamic_rnn(cell, x_input, initial_state=rnn_tuple_state)

From here you continue with the outputs to calculate logits and then a loss with respect to the y_inputs .

Then you run each batch with the sess.run -command, with truncated backpropagation (good explanation here http://r2rt.com/styles-of-truncated-backpropagation.html)

 init_state = np.zeros((num_layers, 2, batch_size, state_size))

...current_state... = sess.run([...state...], feed_dict={x_input:batch_in, state_placeholder:current_state ...})
current_state = np.array(current_state)

You will have to convert the state to a numpy array before feeding it again.

Perhaps it is better to use a librarly like Tflearn or Keras instead?

链接地址: http://www.djcxy.com/p/32126.html

上一篇: 对影响学习的体重和偏见依赖性感到困惑

下一篇: 如何使用TensorFlow堆叠LSTM图层