Optimizing a nonlinear objective function in Deep Reinforcement Learning

Ask Question

Asked 1 year, 6 months ago

Modified 1 year, 6 months ago

Viewed 64 times

I'm working on a reinforcement learning problem where the environment returns a reward pair $(r_{t+1}^{(a)}, r_{t+1}^{(b)})$. The goal is to maximize the following nonlinear objective function. $$ E[\lim_{T \to \infty } \frac{ \sum_{r=t}^{t+T-1} r_{r+1}^{(a)}}{\sum_{r=t}^{t+T-1} r_{r+1}^{(a)} + \sum_{r=t}^{t+T-1} r_{r+1}^{(b)}}] $$ which is the ratio of the agent's cumulative rewards as a fraction of the total cumulative rewards. My intention is to employ Deep Q-Networks as the primary model for reinforcement learning to solve this environment. However, due to the non-linear nature of the objective function, I encountered challenges in applying the original DQN algorithm. As a result, I explored an alternative approach by framing the problem as a multi-objective reinforcement learning problem. I seek validation regarding the appropriateness of this decision.

asked May 13, 2024 at 13:34

Alex

1

$\begingroup$ agent's cumulative rewards as a fraction of the total cumulative rewards is it a multi agent env or what you mean with "total" $\endgroup$

Alberto
– Alberto

2024-05-13 16:51:19 +00:00
Commented May 13, 2024 at 16:51
$\begingroup$ @Alberto, I think the "total" is over the "rewards" $r^{(a)} + r^{(b)}$. These appear not to be rewards in the usual MDP model sense, but some other observed values that are taking a similar role to MDP reward. $\endgroup$

Neil Slater
– Neil Slater

2024-05-13 20:31:24 +00:00
Commented May 13, 2024 at 20:31

Add a comment |

0 You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

Optimizing a nonlinear objective function in Deep Reinforcement Learning

0

You must log in to answer this question.

Hot Network Questions

Optimizing a nonlinear objective function in Deep Reinforcement Learning

0

You must log in to answer this question.

Related

Hot Network Questions