Initial states in Char

2018-06-11 03:52:50

I'm trying to create the Char-RNN to generate text by Andrej Karpathy blog. However, I have some questions about my Tensorflow implementation. (I copy the important part of my code with an explanation)

Model: Basically, it is a RNN where you can use more than one layer. output is used by the next layer as input with diferent cell. sequence_length ... self.input_x = tf.placeholder(tf.int32, [batch_size, sequence_length], name="input_x") self.input_y = tf.placeholder(tf.int32, [batch_size, sequence_length], name="input_y") self.sequence_length = tf.placeholder(tf.int32, [batch_size], name="sequence_length")

output = self.input_x
for layer in range(num_layers):
    with tf.name_scope("recurrent-%s" % (layer+1)):
    cell = tf.contrib.rnn.BasicLSTMCell(num_hidden, state_is_tuple=True)
    self.initial_state = cell.zero_state(batch_size, tf.float32)
    output, self.state = tf.nn.dynamic_rnn(cell, output,
                                    initial_state=self.initial_state,
                                    sequence_length=self.sequence_length,
                                    dtype = tf.float32, scope="rnn-%s" % (layer+1))
...

Training: Here I use the zero_state to feed the initial state for training.

...
feed_dict = {model.input_x: x, model.input_y: y,
            model.sequence_length: seq,
            model.initial_state: sess.run(model.initial_state)}
step, loss = sess.run([global_step, model.loss], feed_dict)
...

Testing: I feed sequentially the state of the previous timestamp to get the last state of the sequence and thus generates text from a predefined previous text.

...
start_text = "the meaning of life is"
state = sess.run(model.initial_state)
text = start_text
for word in start_text:
    x = np.array(list(vocab_processor.transform([word])))[0][0]
    feed_dict = {model.input_x: x,
                model.sequence_length: 1, 
                model.initial_state: state}
    state = sess.run(model.state, feed_dict)

for i in range(max_document_length):
    feed_dict = {model.input_x: x,
                model.sequence_length: 1,
                model.initial_state: state}
    state, x = sess.run([model.state, model.predictions], feed_dict)
...

My main questions are about the usage of self.state and self.initial:

Do I feed correctly the initial_state or all the time it is going to be zero_state in spite of I'm feeding with a different state in the feed_dict?

If I print model.state each time, is it the state of the last layer? I'm afraid that I'm feeding all the layers with the same initial_state which I don't think is the correct way to do it. Probably, I have to separate the different layer state and feed each one to the next state.

链接地址: http://www.djcxy.com/p/32060.html

上一篇: Tensorflow冻结模型，但反向传播错误和更新输入层

下一篇: Char的初始状态