python - Implementing an LSTM network with Keras and TensorFlow

Question

With limited knowledge, I've built an LSTM network. I would like to validate my assumptions and better understand the Keras API.

Network Code:

#...
model.add(LSTM(8, batch_input_shape=(None, 100, 4), return_sequences=True))
model.add(LeakyReLU())
model.add(LSTM(4, return_sequences=True))
model.add(LeakyReLU())
model.add(LSTM(1, return_sequences=False, activation='softmax'))
#...

I have tried to build a network with 4 features input, 2 hidden layers: the first one with 8 neurons, second one with 4 neurons and 1 neuron on the output layer.

The activation I wanted was LeakyReLU.

Q:

Is the implementation correct?
i.e.: does the code reflects what I planned?
When using LeakyReLU should I add linear activation on the previous layer?
i.e.: Do I need to add activation='linear' to the LSTM layers?

Can you comment what task are you trying to solve with this configuration? If it weren't for LSTM, it would look like regular CNN. — igrinis
– igrinis, Commented Oct 7, 2018 at 14:44
I agree. In your figure there is no hint about the temporal dimension of the LSTM (in your case a sequence of 100). Each timepoint will be something like your figure (except for the output), that will only be present in the last timepoint (because of your return_sequences=False) — Daniel GL
– Daniel GL, Commented Oct 8, 2018 at 7:41
The network is aim to detect fraudulent sources, each source produces transactions, some are fraudulent. I would like to determine is this source is fraudulent after 100 samples (4 features each). — Shlomi Schwartz
– Shlomi Schwartz, Commented Oct 9, 2018 at 7:16

today · Accepted Answer · 2018-10-04 15:50:31Z

4

+50

As for the first question: "correct" in what sense? i.e. It depends on the problem you are modeling and therefore more details need to be provided.

softmax is not used as the activation function when the last layer has only one output unit. That's because softmax normalizes the output to make the sum of its elements be one, i.e. to resemble a probability distribution. Therefore, if you use it on a layer with only one output unit it would always have an output of 1. Instead, either linear (in case of regression, i.e. predicting real values) or sigmoid (in case of binary classification) is used. Additionally, commonly a Dense layer is used as the last layer which acts as the final regressor or classifier. For example:

model.add(LSTM(8, batch_input_shape=(None, 100, 4), return_sequences=True))
model.add(LeakyReLU())
model.add(LSTM(4, return_sequences=True))
model.add(LeakyReLU())
model.add(LSTM(1, return_sequences=False))
model.add(Dense(1, activation='sigmoid'))

As for the layers and number of units (according to the figure): it is a bit ambiguous, but I think there are three LSTM layers, the first one has 4 units, the second one has 8 units and the last one has 4 units. As for the final layer it seems to be a Dense layer. So the model would look like this (assuming LeakyReLU is applied on the output of LSTM layers):

model.add(LSTM(4, batch_input_shape=(None, 100, 4), return_sequences=True))
model.add(LeakyReLU())
model.add(LSTM(8, return_sequences=True))
model.add(LeakyReLU())
model.add(LSTM(4, return_sequences=False))
model.add(Dense(1, activation='sigmoid')) # or activation='linear' if it is a regression problem

As for using the LeakyReLU layer: I guess you are right that linear activation should be used as the activation of its previous layer (as also suggested here, though aDense layer has been used there). That's because by default the activation of LSTM layer is hyperbolic tangent (i.e. tanh) and therefore it squashes the outputs to the range [-1,1] which I think may not be efficient when you apply LeakyReLU on it; however, I am not sure about this since I am not completely familiar with leaky relu's practical and recommended usage.

edited Oct 4, 2018 at 15:50

answered Oct 4, 2018 at 15:10

today

33.6k8 gold badges100 silver badges122 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Shlomi Schwartz Over a year ago

Thanks for the reply, I've updated my questions, so it would be clear.

today Over a year ago

@ShlomiSchwartz As I see in the figure, there are three LSTM layers so I think what I have suggested in my answer is correct.

Collectives™ on Stack Overflow

python - Implementing an LSTM network with Keras and TensorFlow

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related