Pretrained embedding type error

2018-06-11 04:02:09

I am creating a computational graph in Tensorflow and I want to use the pretrained vectors. I have a method the preloads the vectors of all my words in the dataset into a matrix.

    def preload_vectors(word2vec_path, word2id, vocab_size, emb_dim):
    if word2vec_path:
        print('Load word2vec_norm file {}'.format(word2vec_path))
        with open(word2vec_path,'r') as f:
            header=f.readline()
            print(vocab_size, emb_dim)
            scale = np.sqrt(3.0 / emb_dim)
            init_W = np.random.uniform(-scale, scale, [vocab_size, emb_dim])

            print('vocab_size={}'.format(vocab_size))
            while True:
                line=f.readline()
                if not line:break
                word=line.split()[0]
                if word in word2id:
                    init_W[word2id[word]] = np.array(line.split()[1:], dtype = np.float32)
    return init_W

    init_W = preload_vectors("data/GoogleNews-vectors-negative300.txt", word2id, word_vocab_size, FLAGS.word_embedding_dim)

Output:

    Load word2vec_norm file data/GoogleNews-vectors-negative300.txt
    2556 300
    vocab_size=2556

In the computational graph, I have this:

    W = tf.Variable(tf.constant(0.0, shape = [word_vocab_size,FLAGS.word_embedding_dim]),trainable = False, name='word_embeddings')
    embedding_placeholder = tf.placeholder(tf.float32, shape = [word_vocab_size, FLAGS.word_embedding_dim])
    embedding_init = W.assign(embedding_placeholder)

And finally, in the session I feed the init_W to embedding_placeholder:

_,train_cost,train_predict=sess.run([train_op,cost,prediction], feed_dict={
          //other model inputs here
          embedding_placeholder: init_W
          })

But I get this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-18-732a79dc5ebd>     in <module>()
 68                             labels:  next_batch_input.relatedness_scores,
     69                             dropout_f: config.keep_prob,
---> 70                             embedding_placeholder: init_W
 71                         })
 72                     avg_cost+=train_cost

/Users/kurt/anaconda2/envs/tensorflow/lib/python2.7/site- packages/tensorflow/python/client/session.pyc in run(self, fetches,   feed_dict, options, run_metadata)
764     try:
765       result = self._run(None, fetches, feed_dict, options_ptr,
--> 766                          run_metadata_ptr)
767       if run_metadata:
768         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/Users/kurt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
935                 ' to a larger type (e.g. int64).')
936 
--> 937           np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
938 
939           if not   subfeed_t.get_shape().is_compatible_with(np_val.shape):

/Users/kurt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/numpy/core/numeric.pyc in asarray(a, dtype, order)
480 
481     """
--> 482     return array(a, dtype, copy=False, order=order)
483 
484 def asanyarray(a, dtype=None, order=None):

TypeError: float() argument must be a string or a number

I checked the values of the init_W array and they are float:

type(init_W[0][0])
numpy.float64

I used to be able to do this recently with no problems. I must have missed out something here? Please, I need your help. Thanks!

Apparently, the problem was not with the embedding_placeholder but with some other input to feed_dict. It was not clear though with the error message.

链接地址: http://www.djcxy.com/p/32078.html

上一篇: 简单的Tensorflow多层神经网络不学习

下一篇: 预训练嵌入类型错误