In Python, what is the way to plot different categories of data that have the same x and y axis?

Question

For background: I am using the Consumer Financial Protection Bureau dataset on consumer complaints. I want to make a complaint-wise time-series plot by plotting complaint-wise counts on y and time on x getting a line for each type of complaint. I want to see how the values have changed over time. I've experimented with Seaborn and Plotly so far. Below is in Plotly.

trace1 = go.Scatter(x=df.DateReceived,
                    y=df.Sum,
                    name = "Time Series of Types of Complaints",
                    line = dict(color = 'blue'),
                    opacity = 0.4)

layout = dict(title='Time Series of Types of Complaints',)

fig = dict(data=[trace1], layout=layout)
iplot(fig)

Attempted Plot

The dataframe looks like this:

data = {'Date': ['2011-12-01', '2011-12-06', '2011-12-06', '2011-12-07', '2011-12-07'], 'Issue':  ['Loan Modification', 'Loan Servicing', 'Loan Servicing', 'Loan Modification', 'Loan Servicing'], 'Sum': [1, 1, 2, 2, 3]}

df = pd.DataFrame(data)

I know the issue in my plot is that it's connecting all of the different sums together and not separating them.

I know I could separate each sum into different columns for each of the different types of complaints. And then adding each trace on manually, doing something like this (taken from Plotly website):

fig.add_trace(go.Scatter(
                x=df.Date,
                y=df['AAPL.Low'],
                name="AAPL Low",
                line_color='dimgray',
                opacity=0.8))

But there must be an easier / less bruteforce way, where I can keep all of the sums in one column, and delineate by type of issue.

Maybe you can check the "parallel coordinates" plots. You can find them with matplotlib and plotly: plot.ly/python/parallel-coordinates-plot — pedro_galher
– pedro_galher, Commented Nov 24, 2019 at 19:40
@pedro_galher thanks for that tip, that's a very cool page. but not quite what I'm looking for, for example, in Seaborn, with a scatterplot you can do something like sns.lmplot(data=movies, x='CriticRating',y='AudienceRating', fit_reg=False, hue='Genre'), and Seaborn smartly creates different colored points by Genre. Is there for Seaborn or Plotly a way to smartly create different lines by Issue type? — antcny
– antcny, Commented Nov 24, 2019 at 19:57

vestland · Accepted Answer · 2020-07-19 21:12:11Z

In order to illustrate development over time for a data set such as yours, there is no real need to introduce a category as a color change like sns.lmplot(data=movies, x='CriticRating',y='AudienceRating', fit_reg=False, hue='Genre') would do. And since this wasn't really the question but only briefly mentioned in the comments, I'd stick to either a grouped (or stacked) bar chart or a line chart. Here's how to do it using plotly.

Stacked column chart:

Plot for Grouped column chart:

Code for grouped column chart:

# imports
import plotly.graph_objs as go
import pandas as pd
import numpy as np

# data
data = {'Date': ['2011-12-01', '2011-12-06', '2011-12-06', '2011-12-07', '2011-12-07'], 'Issue':  ['Loan Modification', 'Loan Servicing', 'Loan Servicing', 'Loan Modification', 'Loan Servicing'], 'Sum': [1, 1, 2, 2, 3]}
df = pd.DataFrame(data)

# build figure
fig=go.Figure(data=[go.Bar(name='Modification',
                          x=df[df['Issue']=='Loan Modification']['Date'],
                          y=df[df['Issue']=='Loan Modification']['Sum']),
                   
                    go.Bar(name='Servicing',
                          x=df[df['Issue']=='Loan Servicing']['Date'],
                          y=df[df['Issue']=='Loan Servicing']['Sum'])])


# Change the bar mode
fig.update_layout(barmode='group')
#fig.update_layout(barmode='stack')

# show figure
fig.show()

Plot for stacked column chart:

Code for stacked column chart:

Just use the same snippet as above, but include the line

fig.update_layout(barmode='stack') right before fig.show()

Line chart:

Just replace go.Bar() with go.Scatter() to get:

Plot for line chart:

Code for line chart:

# imports
import plotly.graph_objs as go
import pandas as pd
import numpy as np

# data
data = {'Date': ['2011-12-01', '2011-12-06', '2011-12-06', '2011-12-07', '2011-12-07'], 'Issue':  ['Loan Modification', 'Loan Servicing', 'Loan Servicing', 'Loan Modification', 'Loan Servicing'], 'Sum': [1, 1, 2, 2, 3]}
df = pd.DataFrame(data)

# build figure
fig=go.Figure(data=[go.Scatter(name='Modification',
                          x=df[df['Issue']=='Loan Modification']['Date'],
                          y=df[df['Issue']=='Loan Modification']['Sum']),
                   
                    go.Scatter(name='Servicing',
                          x=df[df['Issue']=='Loan Servicing']['Date'],
                          y=df[df['Issue']=='Loan Servicing']['Sum'])])

# show figure
fig.show()

Now I completely agree if you think the plot is a little weird. But it's because you've got two observations for Loan Servicing on 2012-12-16 in your data sample. You can sort that out by grouping your dataframe correctly before plotting.

Hope this helps! Don't hesitate to let me know if this doesn't work out for you!

This is just what I'm looking for and even more! (hence the accepted answer) Thank you. Do you know by chance if there is a way I can plot without using the sum column (say just by count like a countplot would in seaborn)? If not, all good, thanks again.

Collectives™ on Stack Overflow

In Python, what is the way to plot different categories of data that have the same x and y axis?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related