Invalidargumenterror: Logits and Labels Must Have the Same First Dimension Seq2Seq Tensorflow

Tensorflow Estimator logits and labels must have the same shape

The answer was to create a one_hot first and then to squeeze() the resulting tensor.

This worked for the loss calculation.

loss = tf.losses.sigmoid_cross_entropy(
multi_class_labels=tf.squeeze(tf.one_hot(labels, depth=2), axis=1),

The sigmoid_cross_entropy goes on to make the sigmoid_cross_entropy_with_logits. During the search I ended up switching to what is posted merely by laziness.

ValueError: Dimensions must be equal, but are 512 and 256

The error says that inside the LSTM of the decoder (decoding/decoder/while/BasicDecoderStep/decoder/multi_rnn_cell/cell_0/cell_0/basic_lstm_cell/mul) there is a dimension mismatch during a multiplication (Mul).

My guess is that, for your implementation, you need twice as many cells for the decoder LSTM as for the encoder LSTM, due to the fact that you are using a bidirectional encoder. If you have a bidirectional encoder with a LSTM with 256 cells, then the result will have 512 units (as you concatenate the outputs of the forward and backward LSTM). Currently the decoder seems to expect an input of 256 cells.

using LSTMs Decoder without teacher forcing - Tensorflow

There are different Helpers which all inherit from the same class. More information you can find in the documentation. As you said TrainingHelper requires predefined true inputs which are expected to be outputted from the decoder and this true inputs are fed as next steps (instead of feeding the output of a previous step). This approach (by some research) should speed up training of decoder.

In your case, you are looking for GreedyEmbeddingHelper. Just replace it instead of TrainingHelper as:

training_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(
start_tokens=tf.tile([GO_SYMBOL], [batch_size]),

Just replace it with embedding tensor and variables which you use in your problem. This helper automatically takes the output of a step applies embedding and feed it as input to next steps. For the first step is used the start_token.

The resulting output by using GreedyEmbeddingHelper doesn't have to match the length of expected output. You have to use padding to match their shapes. TensorFlow provides functiontf.pad(). Also tf.contrib.seq2seq.dynamic_decode returns tuple containing (final_outputs, final_state, final_sequence_lengths), so you can use value of final_sequece_lengths for padding.

logits_pad = tf.pad(
[[0, tf.maximum(expected_length - tf.reduce_max(final_seq_lengths), 0)],
[0, 0]],

targets_pad = tf.pad(
[[0, tf.maximum(tf.reduce_max(final_seq_lengths) - expected_length, 0)]],

You may have to change the padding a little bit depending on the shapes of your inputs. Also you don't have to pad the targets if you set the maximum_iterations parameter to match targets shape.

Related Topics

Leave a reply