0

I got a list that contains lists of different length. How can i transform it in a tensor in pytorch without using padding? Is it possible?

[[3, 5, 10, 11], [1, 5, 10]]
1

1 Answer 1

0

It depends on what you want to achieve with the data structure. You can use torch.sparse, for example:

ll = [[3, 5, 10, 11], [1, 5, 10]]
n = len(ll)
m = max(len(l) for l in ll)

ids = [[], []]
values = []
for i, l in enumerate(ll):
    length = len(l)
    ids[0] += [i] * length  # rows
    ids[1] += list(range(length))  # cols
    values += l

t = torch.sparse_coo_tensor(ids, values, (n, m))

Otherwise, you can try with embedding techniques for corpus of documents, such as bag-of-words (though it will generate still some "padding"), tf-idf, etc.

bag-of-words with possible duplicates in inner lists

corpus = [[3, 5, 10, 11], [1, 5, 10]]
n = len(corpus)
m = max(max(inner) for inner in corpus)
t = torch.zeros(n, m)

for i, doc in enumerate(corpus):
    torch.bincount(corpus)

bag-of-words with distinct values in inner lists

corpus = [[3, 5, 10, 11], [1, 5, 10]]
n = len(corpus)
m = max(max(inner) for inner in corpus)

t = torch.zeros(n, m)
for i, doc in enumerate(corpus):
    t[i, doc] = 1
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.