I'm learning RL and understand the basic actor-critic concept, but I'm confused about the technical details of how the critic actually influences the actor during training. Here's my current understanding, there are shared weight and separate weight actor-critic network:
For shared weight, the actor and critic share Encoder + Core (RNN). In backpropagation, critic updates the weights on the Encoder and RNN, and actor also updates the weights on the Encoder (feature extractor) and the RNN, so actor "learns" from the weights updated by critic indirectly and from the gradients combining both updated losses.
For separate weight, both actor and critic have their own Encoder, RNN, so weights are updated separately by their own loss. Thus, they are not affecting each other through weights. Instead, the critic is used to calculate the advantage, and the advantage is used by the actor.
Is my understanding correct? If not, could you explain the flow, point out any crucial details I'm missing, or refer me to where I can gain a better understanding of this?
And in MARL settings, when should I use separate vs. shared weights? What are the key trade-offs?
Any pointers to papers or code examples would be super helpful!