0

I'm trying to convert numpy to tensorflow equivalent code to be compatible with tf.function ...

Given have a (32, 6) numpy array target_values that looks like this:

array([[-0.01656106,  0.04762066,  0.05735449, -0.0284767 , -0.02237438,
        -0.00042562],
       [-0.01420249,  0.0477839 ,  0.0563598 , -0.02971786, -0.02367548,
         0.00001262],
       [-0.01695916,  0.04826669,  0.05893629, -0.03067053, -0.02261235,
         0.00345904],
       [-0.01953977,  0.04540274,  0.05829531, -0.02759781, -0.02390759,
        -0.00487727],
       [-0.01708016,  0.04894669,  0.0606699 , -0.02576046, -0.02461138,
        -0.00068538],
       [-0.01604217,  0.04770135,  0.05761468, -0.02858265, -0.02624938,
        -0.00084356],
       [-0.01527106,  0.04699571,  0.05959677, -0.02956396, -0.02510098,
        -0.00223234],
       [-0.01448676,  0.04620824,  0.05775366, -0.03008122, -0.02655901,
        -0.00159649],
       [-0.0172577 ,  0.04814827,  0.05807308, -0.02916523, -0.02367857,
        -0.00100602],
       [-0.01690523,  0.0484785 ,  0.05807881, -0.02960616, -0.02560546,
        -0.00065042],
       [-0.0166171 ,  0.0488232 ,  0.05776291, -0.03231864, -0.02132723,
        -0.00033605],
       [-0.01541627,  0.04840397,  0.0580376 , -0.02927143, -0.02461101,
         0.00121263],
       [-0.01685588,  0.047661  ,  0.05873172, -0.02989979, -0.02574112,
        -0.00126612],
       [-0.01333553,  0.05043796,  0.05915743, -0.02990219, -0.02657976,
        -0.0007656 ],
       [-0.01531163,  0.04781894,  0.05637252, -0.02968849, -0.02225551,
        -0.00151382],
       [-0.01357749,  0.04807179,  0.05955081, -0.02748637, -0.02498721,
        -0.00040934],
       [-0.01606943,  0.04768877,  0.05455931, -0.03136749, -0.02475093,
         0.00245846],
       [-0.01609829,  0.04687681,  0.05982678, -0.02886578, -0.02608151,
         0.00015348],
       [-0.01503662,  0.04740106,  0.05958583, -0.03141545, -0.02522127,
        -0.00063602],
       [-0.01697148,  0.04910276,  0.05744712, -0.02858391, -0.02481578,
        -0.00072039],
       [-0.01503395,  0.04843756,  0.05773868, -0.03061879, -0.02586869,
        -0.00025573],
       [-0.0152991 ,  0.04847359,  0.05739099, -0.0299796 , -0.02552593,
        -0.00334571],
       [-0.01324895,  0.04529134,  0.05534273, -0.03109139, -0.02304241,
        -0.00143186],
       [-0.01280282,  0.05004944,  0.05856398, -0.0314032 , -0.02394999,
        -0.00030306],
       [-0.01677033,  0.04876196,  0.05794405, -0.02888608, -0.02658239,
        -0.00015171],
       [-0.01572544,  0.04779808,  0.05939355, -0.03048976, -0.02896303,
        -0.00090334],
       [-0.01542805,  0.04709881,  0.05839922, -0.02894112, -0.02240603,
        -0.00188624],
       [-0.01493233,  0.0476524 ,  0.0581631 , -0.0297201 , -0.02485022,
        -0.00087418],
       [-0.01804641,  0.04739738,  0.06070606, -0.02981704, -0.02543145,
        -0.00115484],
       [-0.01518638,  0.04843838,  0.05744548, -0.02980216, -0.02420005,
         0.00036349],
       [-0.01442349,  0.04673778,  0.05804737, -0.03062913, -0.02476445,
        -0.00066772],
       [-0.01598305,  0.04622466,  0.0588723 , -0.03096713, -0.02364032,
        -0.00005574]])

Given another (32,) array of indices actions with values being in range(5) inclusive:

array([0, 2, 5, 5, 1, 1, 3, 4, 0, 5, 4, 3, 4, 5, 1, 0, 3, 0, 0, 2, 2, 2,
       0, 1, 4, 1, 4, 4, 0, 4, 1, 0])

I'm expecting this result:

array([-0.01656106,  0.0563598 ,  0.00345904, -0.00487727,  0.04894669,
        0.04770135, -0.02956396, -0.02655901, -0.0172577 , -0.00065042,
       -0.02132723, -0.02927143, -0.02574112, -0.0007656 ,  0.04781894,
       -0.01357749, -0.03136749, -0.01609829, -0.01503662,  0.05744712,
        0.05773868,  0.05739099, -0.01324895,  0.05004944, -0.02658239,
        0.04779808, -0.02240603, -0.02485022, -0.01804641, -0.02420005,
        0.04673778, -0.01598305], dtype=float32)

For self.batch_size == 32, I'm able to achieve what I need in numpy using:

state_action_values = target_values[np.arange(self.batch_size), actions]

For target_value_update being another (32,) array of new values, I will need to assign the new values to this slice using:

target_values[np.arange(self.batch_size), actions] = target_value_update

However in tensorflow under tf.function, this is not possible and I get the following error:

TypeError: Only integers, slices (`:`), ellipsis (`...`), tf.newaxis (`None`) and scalar tf.int32/tf.int64 tensors are valid indices, got array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31])

So I try:

target_values = tf.Variable(target_values)
state_action_values = tf.gather(target_values, actions, axis=1)

However here's the value of state_action_values which should be (32,) not (32, 32)

Tensor("GatherV2:0", shape=(32, 32), dtype=float32)

1 Answer 1

1

Use gather_nd():

a = tf.range(32)[:, tf.newaxis]
a = tf.concat((a, actions[:, tf.newaxis]), -1)
output = tf.gather_nd(target_values, a)
Sign up to request clarification or add additional context in comments.

3 Comments

Is it possible to assign values using the same slice a? I mean after storing the result in output I want to update the existing values with new values and be able to do that under tf.function because everytime I try to create a tf.Variable inside tf.function, I get a ValueError: tf.function-decorated function tried to create variables on non-first call
@bullseye it is impossible to assign tensor elements. You should create a new tensor instead (with concat() or like that)
Yes, I know that it might not be possible so the best option would be to create another tensor. Does tf.tensor_scatter_nd_update do the same job?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.