0

I want my model to output a single value, how can I constrain the value to (a, b)? for example, my code is:

class ActorCritic(nn.Module):
    def __init__(self, num_state_features):
        super(ActorCritic, self).__init__()

        # value
        self.critic_net = nn.Sequential(
            nn.Linear(num_state_features, 64),
            nn.ReLU(),
            nn.Linear(64, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 1)
        )

        # policy
        self.actor_net = nn.Sequential(
            nn.Linear(num_state_features, 64),
            nn.ReLU(),
            nn.Linear(64, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 1),
        )

    def forward(self, state):
        value = self.critic_net(state)
        policy_mean = self.actor_net(state)
        return value, policy_mean

and I want the policy output to be in the range (500, 3000), how can I do this?

(I have tried torch.clamp(), this does not work well since the policy would stay always the same if it is near the limit, for example the output goes to -1000000 and it will then stay 500 forever, or takes really long time to change. The same is true for function like nn.Sigmoid())

2 Answers 2

2

Use an activation function on the final layer that bounds the outputs in some range, then normalize to your desired range. For instance, sigmoid function bound the output in the range [0,1].

output = torch.sigmoid(previous_layer_output) # in range [0,1]
output_normalized = output*(b-a) + a          # in range [a,b]
Sign up to request clarification or add additional context in comments.

3 Comments

I tried this, but the output is either a or b, or really near those 2 values
Before or after training? You may need to initially train with tanh or some other loss function. The gradient of the sigmoid function is close to 0 except near the origin, so if your inputs to the sigmoid layer are initially very far from the origin the gradient would be close to 0 and learning would be extremely slow. This could be solved by appropriately initializing your network or by using a loss function with a non-zero gradient far from the origin
during training, I am tracking the output value after each iteration. It is the exact issue you mention, the output will change very slow, or does not change at all if one of the end of sigmoid is reached. I have tried initialize network near 0, it did helps a lot, but sometimes still have the issue. I will try to use a loss function with a non-zero gradient far from the origin. Thanks!
1

You can use a fixed linear transform to scale your sigmoid to the appropriate bounds:

import torch.nn.functional as F

# ...

    def forward(self, state):
        value = self.critic_net(state)
        value = torch.sigmoid(value)
        value = F.linear(value, weight=4000, bias=-500)
        policy_mean = self.actor_net(state)
        return value, policy_mean

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.