Passing the output of a CNN encoder to convLSTM layers. (PyTorch)

Question

I'm interested in implementing a LinkNet based encoder-decoder structure for semantic segmentation on a custom dataset. I'm trying to introduce convLSTM layers between the encoder and decoder. Typically, as expected, the output of the encoder is a 4-dim output (batch_size, channels, height, width). The convLSTM layers expect a 5-dim input (batch_size, sequence_length, channels, height, width). How do I convert this 4-dim tensor to a 5-dim tensor, without any loss of information? I initially thought of splitting the batch_size to accommodate the sequence_length as well, but that might be a problem since I'm dealing with video frames.

Maybe I'm looking at using sequences of four/five frames for training i.e. the semantic segmentation map of frame t is determined by means of the info of the last three to four frames, and hence, a sequence_length of 4 or 5 would do.

How do I introduce the sequence length? Is it during pre-processing or right after the encoder structure?

Most importantly, HOW TO DO IT?

iven · Accepted Answer · 2020-10-28 09:05:54Z

0

You can't. ConvLSTM expect a sequence, which is the dimension you are missing. LinkNet only takes one image as an input, so you can't really use ConvLSTM inside Linknet.

answered Oct 28, 2020 at 9:05

iven

74 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

iven Over a year ago

They use sequences of frames. arxiv.org/pdf/1905.01058.pdf

iven Over a year ago

If I understand correctly, you have to use the convlstm as encoder and decoder

Collectives™ on Stack Overflow

Passing the output of a CNN encoder to convLSTM layers. (PyTorch)

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related