Python - making scatterplot from non-numeric values

Question

I have a csv file with data that I have imported into a dataframe. 'RI_df = pd.read_csv("../Week15/police.csv")'

Using .head() my data looks like this:

state   stop_date   stop_time   county_name driver_gender   driver_race violation_raw   violation   search_conducted    search_type stop_outcome    is_arrested stop_duration   drugs_related_stop  district
0   RI  2005-01-04  12:55   NaN M   White   Equipment/Inspection Violation  Equipment   False   NaN Citation    False   0-15 Min    False   Zone X4
1   RI  2005-01-23  23:15   NaN M   White   Speeding    Speeding    False   NaN Citation    False   0-15 Min    False   Zone K3
2   RI  2005-02-17  04:15   NaN M   White   Speeding    Speeding    False   NaN Citation    False   0-15 Min    False   Zone X4
3   RI  2005-02-20  17:15   NaN M   White   Call for Service    Other   False   NaN Arrest Driver

RI_df.head().to_dict()

Out[55]:
{'state': {0: 'RI', 1: 'RI', 2: 'RI', 3: 'RI', 4: 'RI'},
 'stop_date': {0: '2005-01-04',
  1: '2005-01-23',
  2: '2005-02-17',
  3: '2005-02-20',
  4: '2005-02-24'},
 'stop_time': {0: '12:55', 1: '23:15', 2: '04:15', 3: '17:15', 4: '01:20'},
 'county_name': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'driver_gender': {0: 'M', 1: 'M', 2: 'M', 3: 'M', 4: 'F'},
 'driver_race': {0: 'White', 1: 'White', 2: 'White', 3: 'White', 4: 'White'},
 'violation_raw': {0: 'Equipment/Inspection Violation',
  1: 'Speeding',
  2: 'Speeding',
  3: 'Call for Service',
  4: 'Speeding'},
 'violation': {0: 'Equipment',
  1: 'Speeding',
  2: 'Speeding',
  3: 'Other',
  4: 'Speeding'},
 'search_conducted': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
 'search_type': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'stop_outcome': {0: 'Citation',
  1: 'Citation',
  2: 'Citation',
  3: 'Arrest Driver',
  4: 'Citation'},
 'is_arrested': {0: False, 1: False, 2: False, 3: True, 4: False},
 'stop_duration': {0: '0-15 Min',
  1: '0-15 Min',
  2: '0-15 Min',
  3: '16-30 Min',
  4: '0-15 Min'},
 'drugs_related_stop': {0: False, 1: False, 2: False, 3: False, 4: False},
 'district': {0: 'Zone X4',
  1: 'Zone K3',
  2: 'Zone X4',
  3: 'Zone X1',
  4: 'Zone X3'}}

RI_df['drugs_related_stop'].value_counts()

Out[27]:
False    90879
True       862
Name: drugs_related_stop, dtype: int64

I am trying to take the true value counts of "drug related stops" and put them on a line graph, in order to see if "drug related stops" have been increasing over time.

ax = RI_df['drugs_related_stop'].value_counts().plot(kind='line',
                                    figsize=(10,8),
                                    title="Drug stops")
ax.set_xlabel("drug stops")
ax.set_ylabel("number of stops")

Nicolas Gervais · Accepted Answer · 2019-12-09 16:39:18Z

1

You should just use groupby().count()

ax = df.groupby('stop_date', as_index=False).count().plot(kind='line',
                 figsize=(10,8), title="Drug stops", x='stop_date',
                 y='district')

Here is the complete code so you can double-check:

import pandas as pd
import numpy as np

df = pd.DataFrame({'state': {0: 'RI', 1: 'RI', 2: 'RI', 3: 'RI', 4: 'RI'},
 'stop_date': {0: '2005-01-23',
  1: '2005-01-23',
  2: '2005-02-17',
  3: '2005-02-17',
  4: '2005-02-24'},
 'stop_time': {0: '12:55', 1: '23:15', 2: '04:15', 3: '17:15', 4: '01:20'},
 'county_name': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan},
 'driver_gender': {0: 'M', 1: 'M', 2: 'M', 3: 'M', 4: 'F'},
 'driver_race': {0: 'White', 1: 'White', 2: 'White', 3: 'White', 4: 'White'},
 'violation_raw': {0: 'Equipment/Inspection Violation',
  1: 'Speeding',
  2: 'Speeding',
  3: 'Call for Service',
  4: 'Speeding'},
 'violation': {0: 'Equipment',
  1: 'Speeding',
  2: 'Speeding',
  3: 'Other',
  4: 'Speeding'},
 'search_conducted': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
 'search_type': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan},
 'stop_outcome': {0: 'Citation',
  1: 'Citation',
  2: 'Citation',
  3: 'Arrest Driver',
  4: 'Citation'},
 'is_arrested': {0: False, 1: False, 2: False, 3: True, 4: False},
 'stop_duration': {0: '0-15 Min',
  1: '0-15 Min',
  2: '0-15 Min',
  3: '16-30 Min',
  4: '0-15 Min'},
 'drugs_related_stop': {0: False, 1: False, 2: False, 3: False, 4: False},
 'district': {0: 'Zone X4',
  1: 'Zone K3',
  2: 'Zone X4',
  3: 'Zone X1',
  4: 'Zone X3'}})

ax = df.groupby('stop_date', as_index=False).count().plot(kind='line',
                 figsize=(10,8), title="Drug stops", x='stop_date',
                 y='district')

edited Dec 9, 2019 at 16:39

answered Dec 8, 2019 at 4:49

Nicolas Gervais

36.9k23 gold badges123 silver badges160 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

pstaeubs Over a year ago

I'm not getting this graph when I try the code above

Nicolas Gervais Over a year ago

I sent you the entire code (see edit). What error do you get?

pstaeubs Over a year ago

I attached my "answer" below of the graph that I am getting. I'm not getting any error, the code functions correctly. I just think I have my x values too small or something.

pstaeubs · Accepted Answer · 2019-12-09 16:40:53Z

0

This is what I'm getting with the code below...

ax = df.groupby('stop_date', as_index=False).count().plot(kind='line',
                 figsize=(10,8), title="Drug stops", x='stop_date',
                 y='district')

answered Dec 9, 2019 at 16:40

pstaeubs

174 bronze badges

Collectives™ on Stack Overflow

Python - making scatterplot from non-numeric values

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related