0

I am trying to plot a graph using pandas data frame. The code is below

import pandas as pd
import numpy as np
from IPython.display import display

movies = pd.read_csv('data/movie.csv')
director = movies['director_name']

director.to_frame().head()

director_name
0   James Cameron
1   Gore Verbinski
2   Sam Mendes
3   Christopher Nolan
4   Doug Walker

director.value_counts()

Steven Spielberg    26
Woody Allen         22
Clint Eastwood      20
Martin Scorsese     20
                    ..
James Nunn           1
Gerard Johnstone     1
Ethan Maniquis       1
Antony Hoffman       1
Name: director_name, Length: 2397, dtype: int64

I want to plot a line-graph between director_name and director value counts.

import matplotlib.pyplot as plt

%matplotlib inline

df_list = list(director)
# print(df_list)

x = df_list
y = list(director.value_counts())

plt.figure(figsize=(15,3))
plt.plot(x, y)
plt.ylim(0, 100)
plt.xlabel('X Axis')
plt.ylabel('Y axis')
plt.title('Line Plot')
plt.suptitle('Figure Title', size=20, y=1.03)

I am getting the following error. What am I doing wrong?

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-70-186ebd6b22d9> in <module>()
      8 
      9 plt.figure(figsize=(15,3))
---> 10 plt.plot(x, y)
     11 #plt.xlim(0, 10)
     12 plt.ylim(0, 100)

~/anaconda/lib/python3.6/site-packages/matplotlib/pyplot.py in plot(*args, **kwargs)
   3238                       mplDeprecation)
   3239     try:
-> 3240         ret = ax.plot(*args, **kwargs)
   3241     finally:
   3242         ax._hold = washold

~/anaconda/lib/python3.6/site-packages/matplotlib/__init__.py in inner(ax, *args, **kwargs)
   1708                     warnings.warn(msg % (label_namer, func.__name__),
   1709                                   RuntimeWarning, stacklevel=2)
-> 1710             return func(ax, *args, **kwargs)
   1711         pre_doc = inner.__doc__
   1712         if pre_doc is None:

~/anaconda/lib/python3.6/site-packages/matplotlib/axes/_axes.py in plot(self, *args, **kwargs)
   1435         kwargs = cbook.normalize_kwargs(kwargs, _alias_map)
   1436 
-> 1437         for line in self._get_lines(*args, **kwargs):
   1438             self.add_line(line)
   1439             lines.append(line)

~/anaconda/lib/python3.6/site-packages/matplotlib/axes/_base.py in _grab_next_args(self, *args, **kwargs)
    402                 this += args[0],
    403                 args = args[1:]
--> 404             for seg in self._plot_args(this, kwargs):
    405                 yield seg
    406 

~/anaconda/lib/python3.6/site-packages/matplotlib/axes/_base.py in _plot_args(self, tup, kwargs)
    382             x, y = index_of(tup[-1])
    383 
--> 384         x, y = self._xy_from_xy(x, y)
    385 
    386         if self.command == 'plot':

~/anaconda/lib/python3.6/site-packages/matplotlib/axes/_base.py in _xy_from_xy(self, x, y)
    241         if x.shape[0] != y.shape[0]:
    242             raise ValueError("x and y must have same first dimension, but "
--> 243                              "have shapes {} and {}".format(x.shape, y.shape))
    244         if x.ndim > 2 or y.ndim > 2:
    245             raise ValueError("x and y can be no greater than 2-D, but have "

ValueError: x and y must have same first dimension, but have shapes (4916,) and (2397,)
0

2 Answers 2

1

Your x = df_list and y = list(director.value_counts()) are not the same dimension. You don't need the x = df_list as y already contains the information you are looking for.

Use this:

labels = director.value_counts().index.values  // Use this for xtick labels
y = list(director.value_counts())
maxY = max(y);
x = range(len(y))

...
ax = plt.plot(x, y, '-', grid=True, color='blue')
ax.set_xticks(range(len(y)))
ax.set_xticklabels(labels)
Sign up to request clarification or add additional context in comments.

Comments

0

IIUC, you can just groupby(), count(), and plot(). You don't need value_counts().

For example, with sample data frame director as:

print(director)
        director_name
0       James Cameron
1      Gore Verbinski
2          Sam Mendes
3   Christopher Nolan
4         Doug Walker
5       James Cameron
6          Sam Mendes
7          Sam Mendes

Use:

director.groupby("director_name").director_name.count().plot()

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.