Plotting a Line Plot between two fields in a pandas dataframe

Question

I am trying to plot a graph using pandas data frame. The code is below

import pandas as pd
import numpy as np
from IPython.display import display

movies = pd.read_csv('data/movie.csv')
director = movies['director_name']

director.to_frame().head()

director_name
0   James Cameron
1   Gore Verbinski
2   Sam Mendes
3   Christopher Nolan
4   Doug Walker

director.value_counts()

Steven Spielberg    26
Woody Allen         22
Clint Eastwood      20
Martin Scorsese     20
                    ..
James Nunn           1
Gerard Johnstone     1
Ethan Maniquis       1
Antony Hoffman       1
Name: director_name, Length: 2397, dtype: int64

I want to plot a line-graph between director_name and director value counts.

import matplotlib.pyplot as plt

%matplotlib inline

df_list = list(director)
# print(df_list)

x = df_list
y = list(director.value_counts())

plt.figure(figsize=(15,3))
plt.plot(x, y)
plt.ylim(0, 100)
plt.xlabel('X Axis')
plt.ylabel('Y axis')
plt.title('Line Plot')
plt.suptitle('Figure Title', size=20, y=1.03)

I am getting the following error. What am I doing wrong?

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-70-186ebd6b22d9> in <module>()
      8 
      9 plt.figure(figsize=(15,3))
---> 10 plt.plot(x, y)
     11 #plt.xlim(0, 10)
     12 plt.ylim(0, 100)

~/anaconda/lib/python3.6/site-packages/matplotlib/pyplot.py in plot(*args, **kwargs)
   3238                       mplDeprecation)
   3239     try:
-> 3240         ret = ax.plot(*args, **kwargs)
   3241     finally:
   3242         ax._hold = washold

~/anaconda/lib/python3.6/site-packages/matplotlib/__init__.py in inner(ax, *args, **kwargs)
   1708                     warnings.warn(msg % (label_namer, func.__name__),
   1709                                   RuntimeWarning, stacklevel=2)
-> 1710             return func(ax, *args, **kwargs)
   1711         pre_doc = inner.__doc__
   1712         if pre_doc is None:

~/anaconda/lib/python3.6/site-packages/matplotlib/axes/_axes.py in plot(self, *args, **kwargs)
   1435         kwargs = cbook.normalize_kwargs(kwargs, _alias_map)
   1436 
-> 1437         for line in self._get_lines(*args, **kwargs):
   1438             self.add_line(line)
   1439             lines.append(line)

~/anaconda/lib/python3.6/site-packages/matplotlib/axes/_base.py in _grab_next_args(self, *args, **kwargs)
    402                 this += args[0],
    403                 args = args[1:]
--> 404             for seg in self._plot_args(this, kwargs):
    405                 yield seg
    406 

~/anaconda/lib/python3.6/site-packages/matplotlib/axes/_base.py in _plot_args(self, tup, kwargs)
    382             x, y = index_of(tup[-1])
    383 
--> 384         x, y = self._xy_from_xy(x, y)
    385 
    386         if self.command == 'plot':

~/anaconda/lib/python3.6/site-packages/matplotlib/axes/_base.py in _xy_from_xy(self, x, y)
    241         if x.shape[0] != y.shape[0]:
    242             raise ValueError("x and y must have same first dimension, but "
--> 243                              "have shapes {} and {}".format(x.shape, y.shape))
    244         if x.ndim > 2 or y.ndim > 2:
    245             raise ValueError("x and y can be no greater than 2-D, but have "

ValueError: x and y must have same first dimension, but have shapes (4916,) and (2397,)

Ivan86 · Accepted Answer · 2017-12-15 22:29:38Z

1

Your x = df_list and y = list(director.value_counts()) are not the same dimension. You don't need the x = df_list as y already contains the information you are looking for.

Use this:

labels = director.value_counts().index.values  // Use this for xtick labels
y = list(director.value_counts())
maxY = max(y);
x = range(len(y))

...
ax = plt.plot(x, y, '-', grid=True, color='blue')
ax.set_xticks(range(len(y)))
ax.set_xticklabels(labels)

edited Dec 15, 2017 at 22:29

answered Dec 15, 2017 at 22:10

Ivan86

5,7182 gold badges16 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

andrew_reece · Accepted Answer · 2017-12-16 06:01:15Z

0

IIUC, you can just groupby(), count(), and plot(). You don't need value_counts().

For example, with sample data frame director as:

print(director)
        director_name
0       James Cameron
1      Gore Verbinski
2          Sam Mendes
3   Christopher Nolan
4         Doug Walker
5       James Cameron
6          Sam Mendes
7          Sam Mendes

Use:

director.groupby("director_name").director_name.count().plot()

answered Dec 16, 2017 at 6:01

andrew_reece

21.4k3 gold badges40 silver badges64 bronze badges

Collectives™ on Stack Overflow

Plotting a Line Plot between two fields in a pandas dataframe

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related