ValueError: negative dimensions are not allowed using pandas pivot_table

Question

I am trying to make item-item collaborative recommendation code. My full dataset can be found here. I want the users to become rows, items to become columns, and ratings to be the values.

My code is as follows:

import pandas as pd     
import numpy as np   
file = pd.read_csv("data.csv", names=['user', 'item', 'rating', 'timestamp'])
table = pd.pivot_table(file, values='rating', index=['user'], columns=['item'])

My data is as follows:

             user        item  rating   timestamp
0  A2EFCYXHNK06IS  5555991584       5   978480000  
1  A1WR23ER5HMAA9  5555991584       5   953424000
2  A2IR4Q0GPAFJKW  5555991584       4  1393545600
3  A2V0KUVAB9HSYO  5555991584       4   966124800
4  A1J0GL9HCA7ELW  5555991584       5  1007683200

And the error is:

Traceback (most recent call last):  
  File "D:\python\reco.py", line 9, in <module>   
    table=pd.pivot_table(file,values='rating',index=['user'],columns=['item'])  
  File "C:\python35\lib\site-packages\pandas\tools\pivot.py", line 133, in   pivot_table     
        table = agged.unstack(to_unstack)   
  File "C:\python35\lib\site-packages\pandas\core\frame.py", line 4047, in       unstack  
    return unstack(self, level, fill_value)
  File "C:\python35\lib\site-packages\pandas\core\reshape.py", line 402, in   unstack      
    return _unstack_multiple(obj, level)    
  File "C:\python35\lib\site-packages\pandas\core\reshape.py", line 297, in   _unstack_multiple  
    unstacked = dummy.unstack('__placeholder__')  
  File "C:\python35\lib\site-packages\pandas\core\frame.py", line 4047, in   unstack  
    return unstack(self, level, fill_value)  
  File "C:\python35\lib\site-packages\pandas\core\reshape.py", line 406, in   unstack  
    return _unstack_frame(obj, level, fill_value=fill_value)  
  File "C:\python35\lib\site-packages\pandas\core\reshape.py", line 449, in   _unstack_frame  
    fill_value=fill_value)  
  File "C:\python35\lib\site-packages\pandas\core\reshape.py", line 103, in   __init__  
    self._make_selectors()  
  File "C:\python35\lib\site-packages\pandas\core\reshape.py", line 137, in   _make_selectors  
    mask = np.zeros(np.prod(self.full_shape), dtype=bool)  
ValueError: negative dimensions are not allowed

Possible duplicate of ValueError: negative dimensions are not allowed — Hamms
– Hamms, Commented Dec 13, 2016 at 23:33
@Hamms. Do not mark it as duplicate, I have already seen the link you provided. But none of the answers there is helpful to my situation. I am not doing any matrix multiplication. — Prashant Sharma
– Prashant Sharma, Commented Dec 13, 2016 at 23:55
please include a sample of your data: mcve. It is absolutely critical here, since this pivot_table call works for this sample data: df = pd.DataFrame(np.random.rand(10,4), columns=['user','item','rating','timestamp']). — Julien Marrec
– Julien Marrec, Commented Dec 14, 2016 at 0:04
And I have absolutely no problem using your own pivot_table call with the data you provided... Try it yourself: copy the data you provided, load it with file = pd.read_clipboard() and then table=pd.pivot_table(file,values='rating',index=['user'],columns=['item']). You need to provide a MCVE: so post a sample of your data that is sufficient to replicate the error you're having. — Julien Marrec
– Julien Marrec, Commented Dec 14, 2016 at 0:20

Community · Accepted Answer · 2017-05-23 12:24:50Z

6

I cannot guarantee that this will complete (I got tired of waiting for it to compute), but here's a way to create a sparse dataframe that hopefully should minimize memory and help.

import pandas as pd
import numpy as np
file=pd.read_csv("data.csv",names=['user','item','rating','timestamp'])

from scipy.sparse import csr_matrix

user_u = list(sorted(file.user.unique()))
item_u = list(sorted(file.item.unique()))

row = file.user.astype('category', categories=user_u).cat.codes
col = file.item.astype('category', categories=item_u).cat.codes

data = file['rating'].tolist()

sparse_matrix = csr_matrix((data, (row, col)), shape=(len(user_u), len(item_u)))

df = pd.SparseDataFrame([ pd.SparseSeries(sparse_matrix[i].toarray().ravel(), fill_value=0) 
                              for i in np.arange(sparse_matrix.shape[0]) ], 
                       index=user_u, columns=item_u, default_fill_value=0)

See this question for more options.

edited May 23, 2017 at 12:24

CommunityBot

11 silver badge

answered Dec 14, 2016 at 2:32

Julien Marrec

11.9k5 gold badges51 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Igor Raush Over a year ago

+1, this is the only way to deal with this data. The full dense ratings matrix will have >127B entries, far too big to fit into memory. You can also use Series.cat.categories to index your sparse data frame, to avoid the list(sorted(...)) thing.

Prashant Sharma Over a year ago

@julien I shall try this. Thanks a lot for your help. I am stuck at this problem for last two days.

Julien Marrec Over a year ago

Let it run and please let me know either way if it worked or not, I'm curious

Collectives™ on Stack Overflow

ValueError: negative dimensions are not allowed using pandas pivot_table

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related