0

I have a the following tensor lets call it lookup_table:

tensor([266, 103,  84,  12,  32,  34,   1, 523,  22, 136, 268, 432,  53,  63,
        201,  51, 164,  69,  31,  42, 122, 131, 119,  36, 245,  60,  28,  81,
          9, 114, 105,   3,  41,  86, 150,  79, 104, 120,  74, 420,  39, 427,
         40,  59,  24, 126, 202, 222, 145, 429,  43,  30,  38,  55,  10, 141,
         85, 121, 203, 240,  96,   7,  64,  89, 127, 236, 117,  99,  54,  90,
         57,  11,  21,  62,  82,  25, 267,  75, 111, 518,  76,  56,  20,   2,
         61, 516,  80,  78, 555, 246, 133, 497,  33, 421,  58, 107,  92,  68,
         13, 113, 235, 875,  35,  98, 102,  27,  14,  15,  72,  37,  16,  50,
        517, 134, 223, 163,  91,  44,  17, 412,  18,  48,  23,   4,  29,  77,
          6, 110,  67,  45, 161, 254, 112,   8, 106,  19, 498, 101,   5, 157,
         83, 350, 154, 238, 115,  26, 142, 143])

And I have another tensor lets call it data, which looks like this:

tensor([[517, 235, 236,  76,  81,  25, 110,  59, 245,  39],
        [523, 114, 350, 246,  30, 222,  39, 517, 106,   2],
        [ 35, 235, 120,  99, 266,  63, 236, 133, 412,  38],
        [134,   2, 497,  21,  78,  60, 142, 498,  24,  89],
        [ 60, 111, 120, 145,  91, 141, 164,  81, 350,  55]])

Now I want something which looks similar to this:

tensor([112, 100, ..., 40],
       [7, 29, ..., 2],
       ...,          ])

I want to use my data tensor to get the index of the lookup table.
Basically I want to vectorize this:

(lookup_table == data).nonzero()

So that this works for multidimensional arrays.

I have read this, but they are not working for my case:
How Pytorch Tensor get the index of specific value
How Pytorch Tensor get the index of elements?
Pytorch tensor - How to get the indexes by a specific tensor

EDIT:
I am basically searching for an optimized/vectorized version of this:

x_data = torch.stack([(lookuptable == data[0][i]).nonzero(as_tuple=False) for i in range(len(data[0]))]).flatten().unsqueeze(0)
print(x_data.size())
for o in range(1, len(data)):
    x_data = torch.cat((x_data, torch.stack([(lookuptable == data[o][i]).nonzero(as_tuple=False) for i in range(len(data[o]))]).flatten().unsqueeze(0)), dim=0)

EDIT 2 Minimal example:
We have the data tensor:

data = torch.Tensor([
        [523, 114, 350, 246,  30, 222,  39, 517, 106,   2],
        [ 35, 235, 120,  99, 266,  63, 236, 133, 412,  38],
        [555, 104,  14,  81,  55, 497, 222,  64,  57, 131]
])

And we have the lookup_table tensor, see above.

If we apply this code to the 2 tensors:

 # convert champion keys into index notation
x_data = torch.stack([(lookuptable == x[0][i]).nonzero(as_tuple=False) for i in range(len(x[0]))]).flatten().unsqueeze(0)
for o in range(1, len(data) - 1):
    x_data = torch.cat((x_data, torch.stack([(lookuptable == x[o][i]).nonzero(as_tuple=False) for i in range(len(x[o]))]).flatten().unsqueeze(0)), dim=0)

We get an output of this:

tensor([[  7,  29, 141,  89,  51,  47,  40, 112, 134,  83],
        [102, 100,  37,  67,   0,  13,  65,  90, 119,  52],
        [ 88,  36, 106,  27,  53,  91,  47,  62,  70,  21]
       ])

This output is what I want, and like I said above its the index of where each value of the tensor data lies on the tensor lookuptable. The problem is that this is not vectorized. And I have no Idea how to vectorize it.

1
  • 1
    Try giving minimal examples for lookup_table and data and give the exact output you want to get. Commented Sep 17, 2021 at 17:53

2 Answers 2

1

Using searchsorted:

Scanning the whole lookup_table array for each input element is quite inefficient. How about sorting the lookup table first (this only needs to be done once)

sorted_lookup_table, indexes = torch.sort(lookup_table)

and then using searchsorted

index_into_sorted = torch.searchsorted(sorted_lookup_table, data)

If you need an index into the original lookup_table, you can get it with

index_into_lookup_table = indexes[index_into_sorted]
Sign up to request clarification or add additional context in comments.

3 Comments

You might want to rename sorted as it is a reserved keyword in Python!
Ok, it's not reserved and your code will run, yes. But it's considered bad practice to use builtin function names as variable names.
This solution works pretty well, thanks a lot. I didnt know about these methods from pytorch.
1

Another, faster, approach that assumes that all values have a limited range, and are int64 (Here, I also assume that they are non-negative, but this limitation can be worked around):

Prep work:

sorted_lookup_table, indexes = torch.sort(lookup_table)
lut = torch.zeros(size=(sorted_lookup_table[-1]+1,), dtype=torch.int64)
lut[:] = -1 # "not found"
lut[sorted_lookup_table] = indexes

Data processing:

index_into_lookup_table = lut[data]

3 Comments

when trying to use lut[sorted_lookup_table] = indexes I get the error IndexError: tensors used as indices must be long, byte or bool tensors even though I tried doing lut[sorted_lookup_table] = indexes.long() and the Tensor should be a long Tensor.
I noticed the other solution is unfortunately not working for me, because in the lookup_table are jumps between the numbers so when I use sorted_lookup_table, indexes = torch.sort(lookup_table) I end up with jumps between the numbers.
@Lupos The original (lookup_table) needs to be long.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.