Initial states in Char
I'm trying to create the Char-RNN to generate text by Andrej Karpathy blog. However, I have some questions about my Tensorflow implementation. (I copy the important part of my code with an explanation)
Model: Basically, it is a RNN where you can use more than one layer. output is used by the next layer as input with diferent cell. sequence_length ... self.input_x = tf.placeholder(tf.int32, [batch_size, sequence_length], name="input_x") self.input_y = tf.placeholder(tf.int32, [batch_size, sequence_length], name="input_y") self.sequence_length = tf.placeholder(tf.int32, [batch_size], name="sequence_length")
output = self.input_x
for layer in range(num_layers):
with tf.name_scope("recurrent-%s" % (layer+1)):
cell = tf.contrib.rnn.BasicLSTMCell(num_hidden, state_is_tuple=True)
self.initial_state = cell.zero_state(batch_size, tf.float32)
output, self.state = tf.nn.dynamic_rnn(cell, output,
initial_state=self.initial_state,
sequence_length=self.sequence_length,
dtype = tf.float32, scope="rnn-%s" % (layer+1))
...
Training: Here I use the zero_state to feed the initial state for training.
...
feed_dict = {model.input_x: x, model.input_y: y,
model.sequence_length: seq,
model.initial_state: sess.run(model.initial_state)}
step, loss = sess.run([global_step, model.loss], feed_dict)
...
Testing: I feed sequentially the state of the previous timestamp to get the last state of the sequence and thus generates text from a predefined previous text.
...
start_text = "the meaning of life is"
state = sess.run(model.initial_state)
text = start_text
for word in start_text:
x = np.array(list(vocab_processor.transform([word])))[0][0]
feed_dict = {model.input_x: x,
model.sequence_length: 1,
model.initial_state: state}
state = sess.run(model.state, feed_dict)
for i in range(max_document_length):
feed_dict = {model.input_x: x,
model.sequence_length: 1,
model.initial_state: state}
state, x = sess.run([model.state, model.predictions], feed_dict)
...
My main questions are about the usage of self.state and self.initial:
上一篇: Tensorflow冻结模型,但反向传播错误和更新输入层
下一篇: Char的初始状态