0

With limited knowledge, I've built an LSTM network. I would like to validate my assumptions and better understand the Keras API.

Network Code:

#...
model.add(LSTM(8, batch_input_shape=(None, 100, 4), return_sequences=True))
model.add(LeakyReLU())
model.add(LSTM(4, return_sequences=True))
model.add(LeakyReLU())
model.add(LSTM(1, return_sequences=False, activation='softmax'))
#...

I have tried to build a network with 4 features input, 2 hidden layers: the first one with 8 neurons, second one with 4 neurons and 1 neuron on the output layer.

enter image description here

The activation I wanted was LeakyReLU.

Q:

  1. Is the implementation correct?
    i.e.: does the code reflects what I planned?
  2. When using LeakyReLU should I add linear activation on the previous layer?
    i.e.: Do I need to add activation='linear' to the LSTM layers?
3
  • Can you comment what task are you trying to solve with this configuration? If it weren't for LSTM, it would look like regular CNN. Commented Oct 7, 2018 at 14:44
  • I agree. In your figure there is no hint about the temporal dimension of the LSTM (in your case a sequence of 100). Each timepoint will be something like your figure (except for the output), that will only be present in the last timepoint (because of your return_sequences=False) Commented Oct 8, 2018 at 7:41
  • The network is aim to detect fraudulent sources, each source produces transactions, some are fraudulent. I would like to determine is this source is fraudulent after 100 samples (4 features each). Commented Oct 9, 2018 at 7:16

1 Answer 1

4
+50

As for the first question: "correct" in what sense? i.e. It depends on the problem you are modeling and therefore more details need to be provided.

softmax is not used as the activation function when the last layer has only one output unit. That's because softmax normalizes the output to make the sum of its elements be one, i.e. to resemble a probability distribution. Therefore, if you use it on a layer with only one output unit it would always have an output of 1. Instead, either linear (in case of regression, i.e. predicting real values) or sigmoid (in case of binary classification) is used. Additionally, commonly a Dense layer is used as the last layer which acts as the final regressor or classifier. For example:

model.add(LSTM(8, batch_input_shape=(None, 100, 4), return_sequences=True))
model.add(LeakyReLU())
model.add(LSTM(4, return_sequences=True))
model.add(LeakyReLU())
model.add(LSTM(1, return_sequences=False))
model.add(Dense(1, activation='sigmoid'))

As for the layers and number of units (according to the figure): it is a bit ambiguous, but I think there are three LSTM layers, the first one has 4 units, the second one has 8 units and the last one has 4 units. As for the final layer it seems to be a Dense layer. So the model would look like this (assuming LeakyReLU is applied on the output of LSTM layers):

model.add(LSTM(4, batch_input_shape=(None, 100, 4), return_sequences=True))
model.add(LeakyReLU())
model.add(LSTM(8, return_sequences=True))
model.add(LeakyReLU())
model.add(LSTM(4, return_sequences=False))
model.add(Dense(1, activation='sigmoid')) # or activation='linear' if it is a regression problem

As for using the LeakyReLU layer: I guess you are right that linear activation should be used as the activation of its previous layer (as also suggested here, though aDense layer has been used there). That's because by default the activation of LSTM layer is hyperbolic tangent (i.e. tanh) and therefore it squashes the outputs to the range [-1,1] which I think may not be efficient when you apply LeakyReLU on it; however, I am not sure about this since I am not completely familiar with leaky relu's practical and recommended usage.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the reply, I've updated my questions, so it would be clear.
@ShlomiSchwartz As I see in the figure, there are three LSTM layers so I think what I have suggested in my answer is correct.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.