1
$\begingroup$

I'm learning RL and understand the basic actor-critic concept, but I'm confused about the technical details of how the critic actually influences the actor during training. Here's my current understanding, there are shared weight and separate weight actor-critic network:

For shared weight, the actor and critic share Encoder + Core (RNN). In backpropagation, critic updates the weights on the Encoder and RNN, and actor also updates the weights on the Encoder (feature extractor) and the RNN, so actor "learns" from the weights updated by critic indirectly and from the gradients combining both updated losses.

For separate weight, both actor and critic have their own Encoder, RNN, so weights are updated separately by their own loss. Thus, they are not affecting each other through weights. Instead, the critic is used to calculate the advantage, and the advantage is used by the actor.

Is my understanding correct? If not, could you explain the flow, point out any crucial details I'm missing, or refer me to where I can gain a better understanding of this?

And in MARL settings, when should I use separate vs. shared weights? What are the key trade-offs?

Any pointers to papers or code examples would be super helpful!

$\endgroup$

1 Answer 1

2
$\begingroup$

I managed to figure this out through a bunch of reading.

For shared weight, critic influences actor through:

  • Critic's value V(s) -> Advantage A -> Actor
  • Shared weight in the Encoder/RNN network as their gradients both shape the weights

For separate weight, critic influences actor only through:

  • The advantage (Actor tries to maximing Advantage from critic's value, but doesn't influence critic feature learning).

And in MARL, to preserve MAPOMDP, separate weight must be used so actor doesn't get influenced by critic through shared weight.

Please feel free to correct me if I'm misunderstanding somewhere.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.