I have a similar file like this one:
movieId title genres userId rating timestamp
0 1 Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy 1 4.0 964982703
1 1 Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy 5 4.0 847434962
2 1 Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy 7 4.5 1106635946
3 1 Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy 15 2.5 1510577970
4 1 Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy 17 4.5 1305696483
5 6 Heat (1995) Action|Crime|Thriller 373 5.0 846830247
6 6 Heat (1995) Action|Crime|Thriller 380 5.0 1494278663
7 6 Heat (1995) Action|Crime|Thriller 385 3.0 840648313
8 6 Heat (1995) Action|Crime|Thriller 386 3.0 842613783
9 6 Heat (1995) Action|Crime|Thriller 389 5.0 857934242
I ran this code to obtain the full data and to process it:
! wget https://www.dropbox.com/s/z4zoofdgdrxe01r/movies.csv
! wget https://www.dropbox.com/s/f328xczt6vju6hi/ratings.csv
import pandas as pd
df_movies = pd.read_csv('movies.csv')
df_ratings = pd.read_csv('ratings.csv')
df_merged=pd.merge(df_movies, df_ratings, how='inner')
this is the code with I have issues:
df_merged.pivot(index='movieId', columns='title', values='rating')
I got:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-74-ad6b3a589ea8> in <module>()
----> 1 df_merged.pivot(index='movieId', columns='title', values='rating')
5 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/reshape/reshape.py in _make_selectors(self)
177
178 if mask.sum() < len(self.index):
--> 179 raise ValueError("Index contains duplicate entries, cannot reshape")
180
181 self.group_index = comp_index
ValueError: Index contains duplicate entries, cannot reshape
What I want is to know which movie has more votes by doing a resume table like a Dynamic Table in excel