1

I am trying to train a 1-D ConvNet for time series classification as shown in this paper (refer to FCN om Fig. 1b) https://arxiv.org/pdf/1611.06455.pdf

The Keras implementation is giving me vastly superior performance. Could someone explain why is that the case?

The code for Pytorch is as follow:

class Net(torch.nn.Module):
def __init__(self):
    super(Net, self).__init__()
    self.conv1 = nn.Conv1d(x_train.shape[1], 128, 8)
    self.bnorm1 = nn.BatchNorm1d(128)        
    self.conv2 = nn.Conv1d(128, 256, 5)
    self.bnorm2 = nn.BatchNorm1d(256)
    self.conv3 = nn.Conv1d(256, 128, 3)
    self.bnorm3 = nn.BatchNorm1d(128)        
    self.dense = nn.Linear(128, nb_classes)

def forward(self, x):
   c1=self.conv1(x)
   b1 = F.relu(self.bnorm1(c1))
   c2=self.conv2(b1)
   b2 = F.relu(self.bnorm2(c2))
   c3=self.conv3(b2)
   b3 = F.relu(self.bnorm3(c3))
   output = torch.mean(b3, 2)
   dense1=self.dense(output)
   return F.softmax(dense1)


 model = Net()
 criterion = nn.CrossEntropyLoss()
 optimizer = torch.optim.SGD(model.parameters(), lr=0.5, momentum=0.99)
 losses=[]
 for t in range(1000):
     y_pred_1= model(x_train.float())
     loss_1 = criterion(y_pred_1, y_train.long())
     print(t, loss_1.item())
     optimizer.zero_grad()
     loss_1.backward()
     optimizer.step() 

For comparison, I use the following code for Keras:

x = keras.layers.Input(x_train.shape[1:])
conv1 = keras.layers.Conv1D(128, 8, padding='valid')(x)
conv1 = keras.layers.BatchNormalization()(conv1)
conv1 = keras.layers.Activation('relu')(conv1)
conv2 = keras.layers.Conv1D(256, 5, padding='valid')(conv1)
conv2 = keras.layers.BatchNormalization()(conv2)
conv2 = keras.layers.Activation('relu')(conv2)
conv3 = keras.layers.Conv1D(128, 3, padding='valid')(conv2)
conv3 = keras.layers.BatchNormalization()(conv3)
conv3 = keras.layers.Activation('relu')(conv3)
full = keras.layers.GlobalAveragePooling1D()(conv3)
out = keras.layers.Dense(nb_classes, activation='softmax')(full)

model = keras.models.Model(inputs=x, outputs=out) 
optimizer = keras.optimizers.SGD(lr=0.5, decay=0.0, momentum=0.99)
model.compile(loss='categorical_crossentropy', optimizer=optimizer) 
hist = model.fit(x_train, Y_train, batch_size=x_train.shape[0], nb_epoch=2000)      

The only difference I see between the two is the initialization but however, the results are just vastly different. For reference, I use the same preprocessing as follows for both the datasets, with a subtle difference in input shapes, for Pytorch (Batch_Size, Channels, Length) and for Keras: (Batch_Size, Length, Channels).

0

1 Answer 1

2

The reason of different results is due to different default parameters of layers and optimizer. For example in pytorch decay-rate of batch-norm is considered as 0.9, whereas in keras it is 0.99. Like that, there may be other variation in default parameters.

If you use same parameters and fixed random seed for initialization, there won't be much difference in the result for both library.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.