Tensorflow Estimator logits and labels must have the same shape
The answer was to create a one_hot first and then to squeeze() the resulting tensor.
This worked for the loss calculation.
loss = tf.losses.sigmoid_cross_entropy(
multi_class_labels=tf.squeeze(tf.one_hot(labels, depth=2), axis=1),
The sigmoid_cross_entropy goes on to make the sigmoid_cross_entropy_with_logits. During the search I ended up switching to what is posted merely by laziness.
ValueError: Dimensions must be equal, but are 512 and 256
The error says that inside the LSTM of the decoder (
decoding/decoder/while/BasicDecoderStep/decoder/multi_rnn_cell/cell_0/cell_0/basic_lstm_cell/mul) there is a dimension mismatch during a multiplication (
My guess is that, for your implementation, you need twice as many cells for the decoder LSTM as for the encoder LSTM, due to the fact that you are using a bidirectional encoder. If you have a bidirectional encoder with a LSTM with 256 cells, then the result will have 512 units (as you concatenate the outputs of the forward and backward LSTM). Currently the decoder seems to expect an input of 256 cells.
using LSTMs Decoder without teacher forcing - Tensorflow
There are different Helpers which all inherit from the same class. More information you can find in the documentation. As you said
TrainingHelper requires predefined true inputs which are expected to be outputted from the decoder and this true inputs are fed as next steps (instead of feeding the output of a previous step). This approach (by some research) should speed up training of decoder.
In your case, you are looking for
GreedyEmbeddingHelper. Just replace it instead of
training_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(
Just replace it with
embedding tensor and variables which you use in your problem. This helper automatically takes the output of a step applies embedding and feed it as input to next steps. For the first step is used the
The resulting output by using
GreedyEmbeddingHelper doesn't have to match the length of expected output. You have to use padding to match their shapes. TensorFlow provides function
tf.contrib.seq2seq.dynamic_decode returns tuple containing
(final_outputs, final_state, final_sequence_lengths), so you can use value of
final_sequece_lengths for padding.
logits_pad = tf.pad(
[[0, tf.maximum(expected_length - tf.reduce_max(final_seq_lengths), 0)],
targets_pad = tf.pad(
[[0, tf.maximum(tf.reduce_max(final_seq_lengths) - expected_length, 0)]],
You may have to change the padding a little bit depending on the shapes of your inputs. Also you don't have to pad the
targets if you set the
maximum_iterations parameter to match