0
$\begingroup$

I'm working on a reinforcement learning problem where the environment returns a reward pair $(r_{t+1}^{(a)}, r_{t+1}^{(b)})$. The goal is to maximize the following nonlinear objective function. $$ E[\lim_{T \to \infty } \frac{ \sum_{r=t}^{t+T-1} r_{r+1}^{(a)}}{\sum_{r=t}^{t+T-1} r_{r+1}^{(a)} + \sum_{r=t}^{t+T-1} r_{r+1}^{(b)}}] $$ which is the ratio of the agent's cumulative rewards as a fraction of the total cumulative rewards. My intention is to employ Deep Q-Networks as the primary model for reinforcement learning to solve this environment. However, due to the non-linear nature of the objective function, I encountered challenges in applying the original DQN algorithm. As a result, I explored an alternative approach by framing the problem as a multi-objective reinforcement learning problem. I seek validation regarding the appropriateness of this decision.

$\endgroup$
2
  • 1
    $\begingroup$ agent's cumulative rewards as a fraction of the total cumulative rewards is it a multi agent env or what you mean with "total" $\endgroup$ Commented May 13, 2024 at 16:51
  • $\begingroup$ @Alberto, I think the "total" is over the "rewards" $r^{(a)} + r^{(b)}$. These appear not to be rewards in the usual MDP model sense, but some other observed values that are taking a similar role to MDP reward. $\endgroup$ Commented May 13, 2024 at 20:31

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.