Pytorch Tensor - How to get the index of a tensor given a multidimensional tensor

Question

I have a the following tensor lets call it lookup_table:

tensor([266, 103,  84,  12,  32,  34,   1, 523,  22, 136, 268, 432,  53,  63,
        201,  51, 164,  69,  31,  42, 122, 131, 119,  36, 245,  60,  28,  81,
          9, 114, 105,   3,  41,  86, 150,  79, 104, 120,  74, 420,  39, 427,
         40,  59,  24, 126, 202, 222, 145, 429,  43,  30,  38,  55,  10, 141,
         85, 121, 203, 240,  96,   7,  64,  89, 127, 236, 117,  99,  54,  90,
         57,  11,  21,  62,  82,  25, 267,  75, 111, 518,  76,  56,  20,   2,
         61, 516,  80,  78, 555, 246, 133, 497,  33, 421,  58, 107,  92,  68,
         13, 113, 235, 875,  35,  98, 102,  27,  14,  15,  72,  37,  16,  50,
        517, 134, 223, 163,  91,  44,  17, 412,  18,  48,  23,   4,  29,  77,
          6, 110,  67,  45, 161, 254, 112,   8, 106,  19, 498, 101,   5, 157,
         83, 350, 154, 238, 115,  26, 142, 143])

And I have another tensor lets call it data, which looks like this:

tensor([[517, 235, 236,  76,  81,  25, 110,  59, 245,  39],
        [523, 114, 350, 246,  30, 222,  39, 517, 106,   2],
        [ 35, 235, 120,  99, 266,  63, 236, 133, 412,  38],
        [134,   2, 497,  21,  78,  60, 142, 498,  24,  89],
        [ 60, 111, 120, 145,  91, 141, 164,  81, 350,  55]])

Now I want something which looks similar to this:

tensor([112, 100, ..., 40],
       [7, 29, ..., 2],
       ...,          ])

I want to use my data tensor to get the index of the lookup table.
Basically I want to vectorize this:

(lookup_table == data).nonzero()

So that this works for multidimensional arrays.

I have read this, but they are not working for my case:
How Pytorch Tensor get the index of specific value
How Pytorch Tensor get the index of elements?
Pytorch tensor - How to get the indexes by a specific tensor

EDIT:
I am basically searching for an optimized/vectorized version of this:

x_data = torch.stack([(lookuptable == data[0][i]).nonzero(as_tuple=False) for i in range(len(data[0]))]).flatten().unsqueeze(0)
print(x_data.size())
for o in range(1, len(data)):
    x_data = torch.cat((x_data, torch.stack([(lookuptable == data[o][i]).nonzero(as_tuple=False) for i in range(len(data[o]))]).flatten().unsqueeze(0)), dim=0)

EDIT 2 Minimal example:
We have the data tensor:

data = torch.Tensor([
        [523, 114, 350, 246,  30, 222,  39, 517, 106,   2],
        [ 35, 235, 120,  99, 266,  63, 236, 133, 412,  38],
        [555, 104,  14,  81,  55, 497, 222,  64,  57, 131]
])

And we have the lookup_table tensor, see above.

If we apply this code to the 2 tensors:

 # convert champion keys into index notation
x_data = torch.stack([(lookuptable == x[0][i]).nonzero(as_tuple=False) for i in range(len(x[0]))]).flatten().unsqueeze(0)
for o in range(1, len(data) - 1):
    x_data = torch.cat((x_data, torch.stack([(lookuptable == x[o][i]).nonzero(as_tuple=False) for i in range(len(x[o]))]).flatten().unsqueeze(0)), dim=0)

We get an output of this:

tensor([[  7,  29, 141,  89,  51,  47,  40, 112, 134,  83],
        [102, 100,  37,  67,   0,  13,  65,  90, 119,  52],
        [ 88,  36, 106,  27,  53,  91,  47,  62,  70,  21]
       ])

This output is what I want, and like I said above its the index of where each value of the tensor data lies on the tensor lookuptable. The problem is that this is not vectorized. And I have no Idea how to vectorize it.

Try giving minimal examples for lookup_table and data and give the exact output you want to get. — arun
– arun, Commented Sep 17, 2021 at 17:53

MWB · Accepted Answer · 2021-09-23 02:05:45Z

1

Using searchsorted:

Scanning the whole lookup_table array for each input element is quite inefficient. How about sorting the lookup table first (this only needs to be done once)

sorted_lookup_table, indexes = torch.sort(lookup_table)

and then using searchsorted

index_into_sorted = torch.searchsorted(sorted_lookup_table, data)

If you need an index into the original lookup_table, you can get it with

index_into_lookup_table = indexes[index_into_sorted]

edited Sep 23, 2021 at 2:05

answered Sep 18, 2021 at 0:18

MWB

12.7k9 gold badges55 silver badges107 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Ivan Over a year ago

You might want to rename sorted as it is a reserved keyword in Python!

Ivan Over a year ago

Ok, it's not reserved and your code will run, yes. But it's considered bad practice to use builtin function names as variable names.

Lupos Over a year ago

This solution works pretty well, thanks a lot. I didnt know about these methods from pytorch.

MWB · Accepted Answer · 2021-09-22 21:28:18Z

1

Another, faster, approach that assumes that all values have a limited range, and are int64 (Here, I also assume that they are non-negative, but this limitation can be worked around):

Prep work:

sorted_lookup_table, indexes = torch.sort(lookup_table)
lut = torch.zeros(size=(sorted_lookup_table[-1]+1,), dtype=torch.int64)
lut[:] = -1 # "not found"
lut[sorted_lookup_table] = indexes

Data processing:

index_into_lookup_table = lut[data]

edited Sep 22, 2021 at 21:28

answered Sep 19, 2021 at 15:37

MWB

12.7k9 gold badges55 silver badges107 bronze badges

3 Comments

Lupos Over a year ago

when trying to use lut[sorted_lookup_table] = indexes I get the error IndexError: tensors used as indices must be long, byte or bool tensors even though I tried doing lut[sorted_lookup_table] = indexes.long() and the Tensor should be a long Tensor.

Lupos Over a year ago

I noticed the other solution is unfortunately not working for me, because in the lookup_table are jumps between the numbers so when I use sorted_lookup_table, indexes = torch.sort(lookup_table) I end up with jumps between the numbers.

MWB Over a year ago

@Lupos The original (lookup_table) needs to be long.

Collectives™ on Stack Overflow

Pytorch Tensor - How to get the index of a tensor given a multidimensional tensor

2 Answers 2

3 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related