预训练嵌入类型错误
我在Tensorflow中创建了一个计算图,并且我想使用预训练矢量。 我有一种方法将数据集中所有单词的向量预加载到矩阵中。
def preload_vectors(word2vec_path, word2id, vocab_size, emb_dim): if word2vec_path: print('Load word2vec_norm file {}'.format(word2vec_path)) with open(word2vec_path,'r') as f: header=f.readline() print(vocab_size, emb_dim) scale = np.sqrt(3.0 / emb_dim) init_W = np.random.uniform(-scale, scale, [vocab_size, emb_dim]) print('vocab_size={}'.format(vocab_size)) while True: line=f.readline() if not line:break word=line.split()[0] if word in word2id: init_W[word2id[word]] = np.array(line.split()[1:], dtype = np.float32) return init_W init_W = preload_vectors("data/GoogleNews-vectors-negative300.txt", word2id, word_vocab_size, FLAGS.word_embedding_dim)
输出:
Load word2vec_norm file data/GoogleNews-vectors-negative300.txt 2556 300 vocab_size=2556
在计算图中,我有这样的:
W = tf.Variable(tf.constant(0.0, shape = [word_vocab_size,FLAGS.word_embedding_dim]),trainable = False, name='word_embeddings') embedding_placeholder = tf.placeholder(tf.float32, shape = [word_vocab_size, FLAGS.word_embedding_dim]) embedding_init = W.assign(embedding_placeholder)
最后,在会话中,我将init_W提供给embedding_placeholder:
_,train_cost,train_predict=sess.run([train_op,cost,prediction], feed_dict={
//other model inputs here
embedding_placeholder: init_W
})
但是我得到这个错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-18-732a79dc5ebd> in <module>()
68 labels: next_batch_input.relatedness_scores,
69 dropout_f: config.keep_prob,
---> 70 embedding_placeholder: init_W
71 })
72 avg_cost+=train_cost
/Users/kurt/anaconda2/envs/tensorflow/lib/python2.7/site- packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
764 try:
765 result = self._run(None, fetches, feed_dict, options_ptr,
--> 766 run_metadata_ptr)
767 if run_metadata:
768 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/Users/kurt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
935 ' to a larger type (e.g. int64).')
936
--> 937 np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
938
939 if not subfeed_t.get_shape().is_compatible_with(np_val.shape):
/Users/kurt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/numpy/core/numeric.pyc in asarray(a, dtype, order)
480
481 """
--> 482 return array(a, dtype, copy=False, order=order)
483
484 def asanyarray(a, dtype=None, order=None):
TypeError: float() argument must be a string or a number
我检查了init_W数组的值,它们是float:
type(init_W[0][0])
numpy.float64
我曾经最近能够做到这一点,没有任何问题。 我一定在这里错过了什么? 拜托我需要你的帮忙。 谢谢!
显然,问题不是embedding_placeholder,而是feed_dict的其他输入。 虽然有错误信息,但并不清楚。
链接地址: http://www.djcxy.com/p/32077.html