Pretrained embedding type error
I am creating a computational graph in Tensorflow and I want to use the pretrained vectors. I have a method the preloads the vectors of all my words in the dataset into a matrix.
def preload_vectors(word2vec_path, word2id, vocab_size, emb_dim): if word2vec_path: print('Load word2vec_norm file {}'.format(word2vec_path)) with open(word2vec_path,'r') as f: header=f.readline() print(vocab_size, emb_dim) scale = np.sqrt(3.0 / emb_dim) init_W = np.random.uniform(-scale, scale, [vocab_size, emb_dim]) print('vocab_size={}'.format(vocab_size)) while True: line=f.readline() if not line:break word=line.split()[0] if word in word2id: init_W[word2id[word]] = np.array(line.split()[1:], dtype = np.float32) return init_W init_W = preload_vectors("data/GoogleNews-vectors-negative300.txt", word2id, word_vocab_size, FLAGS.word_embedding_dim)
Output:
Load word2vec_norm file data/GoogleNews-vectors-negative300.txt 2556 300 vocab_size=2556
In the computational graph, I have this:
W = tf.Variable(tf.constant(0.0, shape = [word_vocab_size,FLAGS.word_embedding_dim]),trainable = False, name='word_embeddings') embedding_placeholder = tf.placeholder(tf.float32, shape = [word_vocab_size, FLAGS.word_embedding_dim]) embedding_init = W.assign(embedding_placeholder)
And finally, in the session I feed the init_W to embedding_placeholder:
_,train_cost,train_predict=sess.run([train_op,cost,prediction], feed_dict={
//other model inputs here
embedding_placeholder: init_W
})
But I get this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-18-732a79dc5ebd> in <module>()
68 labels: next_batch_input.relatedness_scores,
69 dropout_f: config.keep_prob,
---> 70 embedding_placeholder: init_W
71 })
72 avg_cost+=train_cost
/Users/kurt/anaconda2/envs/tensorflow/lib/python2.7/site- packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
764 try:
765 result = self._run(None, fetches, feed_dict, options_ptr,
--> 766 run_metadata_ptr)
767 if run_metadata:
768 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/Users/kurt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
935 ' to a larger type (e.g. int64).')
936
--> 937 np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
938
939 if not subfeed_t.get_shape().is_compatible_with(np_val.shape):
/Users/kurt/anaconda2/envs/tensorflow/lib/python2.7/site-packages/numpy/core/numeric.pyc in asarray(a, dtype, order)
480
481 """
--> 482 return array(a, dtype, copy=False, order=order)
483
484 def asanyarray(a, dtype=None, order=None):
TypeError: float() argument must be a string or a number
I checked the values of the init_W array and they are float:
type(init_W[0][0])
numpy.float64
I used to be able to do this recently with no problems. I must have missed out something here? Please, I need your help. Thanks!
Apparently, the problem was not with the embedding_placeholder but with some other input to feed_dict. It was not clear though with the error message.
链接地址: http://www.djcxy.com/p/32078.html下一篇: 预训练嵌入类型错误