How to add an attention mechanism in keras?
If you want to have an attention along the time dimension, then this part of your code seems correct to me:
activations = LSTM(units, return_sequences=True)(embedded)
# compute importance for each step
attention = Dense(1, activation='tanh')(activations)
attention = Flatten()(attention)
attention = Activation('softmax')(attention)
attention = RepeatVector(units)(attention)
attention = Permute([2, 1])(attention)
sent_representation = merge([activations, attention], mode='mul')
You've worked out the attention vector of shape (batch_size, max_length)
:
attention = Activation('softmax')(attention)
I've never seen this code before, so I can't say if this one is actually correct or not:
K.sum(xin, axis=-2)
Further reading (you might have a look):
https://github.com/philipperemy/keras-visualize-activations
https://github.com/philipperemy/keras-attention-mechanism
How to add attention layer to a Bi-LSTM
This can be a possible custom solution with a custom layer that computes attention on the positional/temporal dimension
from tensorflow.keras.layers import Layer
from tensorflow.keras import backend as K
class Attention(Layer):
def __init__(self, return_sequences=True):
self.return_sequences = return_sequences
super(Attention,self).__init__()
def build(self, input_shape):
self.W=self.add_weight(name="att_weight", shape=(input_shape[-1],1),
initializer="normal")
self.b=self.add_weight(name="att_bias", shape=(input_shape[1],1),
initializer="zeros")
super(Attention,self).build(input_shape)
def call(self, x):
e = K.tanh(K.dot(x,self.W)+self.b)
a = K.softmax(e, axis=1)
output = x*a
if self.return_sequences:
return output
return K.sum(output, axis=1)
it's build to receive 3D tensors and output 3D tensors (return_sequences=True)
or 2D tensors (return_sequences=False)
. below a dummy example
# dummy data creation
max_len = 100
max_words = 333
emb_dim = 126
n_sample = 5
X = np.random.randint(0,max_words, (n_sample,max_len))
Y = np.random.randint(0,2, n_sample)
with return_sequences=True
model = Sequential()
model.add(Embedding(max_words, emb_dim, input_length=max_len))
model.add(Bidirectional(LSTM(32, return_sequences=True)))
model.add(Attention(return_sequences=True)) # receive 3D and output 3D
model.add(LSTM(32))
model.add(Dense(1, activation='sigmoid'))
model.summary()
model.compile('adam', 'binary_crossentropy')
model.fit(X,Y, epochs=3)
with return_sequences=False
model = Sequential()
model.add(Embedding(max_words, emb_dim, input_length=max_len))
model.add(Bidirectional(LSTM(32, return_sequences=True)))
model.add(Attention(return_sequences=False)) # receive 3D and output 2D
model.add(Dense(1, activation='sigmoid'))
model.summary()
model.compile('adam', 'binary_crossentropy')
model.fit(X,Y, epochs=3)
You can integrate it into your networks easily
here the running notebook
Related Topics
How to Match a Newline Character in a Raw String
Python Pip Install Fails: Invalid Command Egg_Info
Python 3D Polynomial Surface Fit, Order Dependent
Python | Count Number of False Statements in 3 Rows
Python/Regex - How to Extract Date from Filename Using Regular Expression
Python Pandas: Nameerror: Name Is Not Defined
Paramiko Capturing Command Output
Converting Exponential to Float
Plot Two Histograms on Single Chart With Matplotlib
How to Get the Name of an Object
Python3 Tkinter Set Image Size
How to Count the Number of Files in a Directory Using Python
Iterate Over Worksheets, Rows, Columns
Cannot Convert the Series to <Class 'Int''>
Find the Longest Substring in Alphabetical Order
Get Usb Device Address Through Python