0

I am working on a recommendation project where I have data like this:

ID Movie
1   A
2   B
3   C
4   D
..
..

I want to create this dataframe into a sparse matrix like this:

     1  2  3  4 ....n

1    1  0  0  0     0
2    0  1  0  0     0
3    0  0  1  0     0
4    0  0  0  1     0
.
.
n    0  0  0  0     1

Basically both rows and columns contains the ID of the move, and the value is 1 when both row and column element has same value. I want to represent this into a sparse format of

 <sparse matrix of type '<class 'numpy.int32'>'
    with 58770 stored elements in Compressed Sparse Row format>

I tried doing the following:

 - np.diag(items)
 - csr_matrix(items.values)

But I am not able to figure it out. Can anyone help me?

2
  • 1
    Could you specify a bit better an exact input and an expected output? (ideally something that could just be copy-pasted into a Python script) Commented Jun 28, 2019 at 18:44
  • What happened with each of your attempts? Did you get an error? Commented Jun 28, 2019 at 18:45

2 Answers 2

1

You can use scipy.sparse.spdiags

num_data=len(df)
sp=sparse.spdiags(np.ones(num_data), 0, num_data,num_data)

OUTPUT

  (0, 0)    1.0
  (1, 1)    1.0
  (2, 2)    1.0
  (3, 3)    1.0

If ID of the movie is not consistent:

sparse.coo_matrix((np.ones(num_data),(df['ID'],df['ID'])))

if ID is from two different dataframe:

match=list(set(df['ID']).intersection(set(df2['ID'])))
sparse.coo_matrix((np.ones(num_data),(match,match)))
Sign up to request clarification or add additional context in comments.

Comments

1

A matrix with ones down the diagonal and zeros everywhere else is called an "identity matrix". You can create one in python with scipy.sparse.identity(n). The documentation is here.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.