预训练嵌入类型错误

2018-06-11 04:02:09

我在Tensorflow中创建了一个计算图，并且我想使用预训练矢量。我有一种方法将数据集中所有单词的向量预加载到矩阵中。

    def preload_vectors(word2vec_path, word2id, vocab_size, emb_dim):
    if word2vec_path:
        print('Load word2vec_norm file {}'.format(word2vec_path))
        with open(word2vec_path,'r') as f:
            header=f.readline()
            print(vocab_size, emb_dim)
            scale = np.sqrt(3.0 / emb_dim)
            init_W = np.random.uniform(-scale, scale, [vocab_size, emb_dim])

            print('vocab_size={}'.format(vocab_size))
            while True:
                line=f.readline()
                if not line:break
                word=line.split()[0]
                if word in word2id:
                    init_W[word2id[word]] = np.array(line.split()[1:], dtype = np.float32)
    return init_W

    init_W = preload_vectors("data/GoogleNews-vectors-negative300.txt", word2id, word_vocab_size, FLAGS.word_embedding_dim)

输出：

    Load word2vec_norm file data/GoogleNews-vectors-negative300.txt
    2556 300
    vocab_size=2556

在计算图中，我有这样的：

    W = tf.Variable(tf.constant(0.0, shape = [word_vocab_size,FLAGS.word_embedding_dim]),trainable = False, name='word_embeddings')
    embedding_placeholder = tf.placeholder(tf.float32, shape = [word_vocab_size, FLAGS.word_embedding_dim])
    embedding_init = W.assign(embedding_placeholder)

最后，在会话中，我将init_W提供给embedding_placeholder：

_,train_cost,train_predict=sess.run([train_op,cost,prediction], feed_dict={
          //other model inputs here
          embedding_placeholder: init_W
          })

但是我得到这个错误：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-18-732a79dc5ebd>     in <module>()
 68                             labels:  next_batch_input.relatedness_scores,
     69                             dropout_f: config.keep_prob,
---> 70                             embedding_placeholder: init_W
 71                         })
 72                     avg_cost+=train_cost

/Users/kurt/anaconda2/envs/tensorflow/lib/python2.7/site- packages/tensorflow/python/client/session.pyc in run(self, fetches,   feed_dict, options, run_metadata)
764     try:
765       result = self._run(None, fetches, feed_dict, options_ptr,
--> 766                          run_metadata_ptr)
767       if run_metadata:
768         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/Users/kurt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
935                 ' to a larger type (e.g. int64).')
936 
--> 937           np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
938 
939           if not   subfeed_t.get_shape().is_compatible_with(np_val.shape):

/Users/kurt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/numpy/core/numeric.pyc in asarray(a, dtype, order)
480 
481     """
--> 482     return array(a, dtype, copy=False, order=order)
483 
484 def asanyarray(a, dtype=None, order=None):

TypeError: float() argument must be a string or a number

我检查了init_W数组的值，它们是float：

type(init_W[0][0])
numpy.float64

我曾经最近能够做到这一点，没有任何问题。我一定在这里错过了什么？拜托我需要你的帮忙。谢谢！

显然，问题不是embedding_placeholder，而是feed_dict的其他输入。虽然有错误信息，但并不清楚。

链接地址: http://www.djcxy.com/p/32077.html

上一篇: Pretrained embedding type error

下一篇: TensorFlow: simple recurrent neural network