1

For background: I am using the Consumer Financial Protection Bureau dataset on consumer complaints. I want to make a complaint-wise time-series plot by plotting complaint-wise counts on y and time on x getting a line for each type of complaint. I want to see how the values have changed over time. I've experimented with Seaborn and Plotly so far. Below is in Plotly.

trace1 = go.Scatter(x=df.DateReceived,
                    y=df.Sum,
                    name = "Time Series of Types of Complaints",
                    line = dict(color = 'blue'),
                    opacity = 0.4)

layout = dict(title='Time Series of Types of Complaints',)

fig = dict(data=[trace1], layout=layout)
iplot(fig)

Attempted Plot

The dataframe looks like this:

data = {'Date': ['2011-12-01', '2011-12-06', '2011-12-06', '2011-12-07', '2011-12-07'], 'Issue':  ['Loan Modification', 'Loan Servicing', 'Loan Servicing', 'Loan Modification', 'Loan Servicing'], 'Sum': [1, 1, 2, 2, 3]}

df = pd.DataFrame(data)

I know the issue in my plot is that it's connecting all of the different sums together and not separating them.

I know I could separate each sum into different columns for each of the different types of complaints. And then adding each trace on manually, doing something like this (taken from Plotly website):

fig.add_trace(go.Scatter(
                x=df.Date,
                y=df['AAPL.Low'],
                name="AAPL Low",
                line_color='dimgray',
                opacity=0.8))

But there must be an easier / less bruteforce way, where I can keep all of the sums in one column, and delineate by type of issue.

2
  • Maybe you can check the "parallel coordinates" plots. You can find them with matplotlib and plotly: plot.ly/python/parallel-coordinates-plot Commented Nov 24, 2019 at 19:40
  • @pedro_galher thanks for that tip, that's a very cool page. but not quite what I'm looking for, for example, in Seaborn, with a scatterplot you can do something like sns.lmplot(data=movies, x='CriticRating',y='AudienceRating', fit_reg=False, hue='Genre'), and Seaborn smartly creates different colored points by Genre. Is there for Seaborn or Plotly a way to smartly create different lines by Issue type? Commented Nov 24, 2019 at 19:57

1 Answer 1

2

In order to illustrate development over time for a data set such as yours, there is no real need to introduce a category as a color change like sns.lmplot(data=movies, x='CriticRating',y='AudienceRating', fit_reg=False, hue='Genre') would do. And since this wasn't really the question but only briefly mentioned in the comments, I'd stick to either a grouped (or stacked) bar chart or a line chart. Here's how to do it using plotly.

Stacked column chart:

Plot for Grouped column chart:

enter image description here

Code for grouped column chart:

# imports
import plotly.graph_objs as go
import pandas as pd
import numpy as np

# data
data = {'Date': ['2011-12-01', '2011-12-06', '2011-12-06', '2011-12-07', '2011-12-07'], 'Issue':  ['Loan Modification', 'Loan Servicing', 'Loan Servicing', 'Loan Modification', 'Loan Servicing'], 'Sum': [1, 1, 2, 2, 3]}
df = pd.DataFrame(data)

# build figure
fig=go.Figure(data=[go.Bar(name='Modification',
                          x=df[df['Issue']=='Loan Modification']['Date'],
                          y=df[df['Issue']=='Loan Modification']['Sum']),
                   
                    go.Bar(name='Servicing',
                          x=df[df['Issue']=='Loan Servicing']['Date'],
                          y=df[df['Issue']=='Loan Servicing']['Sum'])])


# Change the bar mode
fig.update_layout(barmode='group')
#fig.update_layout(barmode='stack')

# show figure
fig.show()

Plot for stacked column chart:

enter image description here

Code for stacked column chart:

Just use the same snippet as above, but include the line

fig.update_layout(barmode='stack') right before fig.show()

Line chart:

Just replace go.Bar() with go.Scatter() to get:

Plot for line chart:

enter image description here

Code for line chart:

# imports
import plotly.graph_objs as go
import pandas as pd
import numpy as np

# data
data = {'Date': ['2011-12-01', '2011-12-06', '2011-12-06', '2011-12-07', '2011-12-07'], 'Issue':  ['Loan Modification', 'Loan Servicing', 'Loan Servicing', 'Loan Modification', 'Loan Servicing'], 'Sum': [1, 1, 2, 2, 3]}
df = pd.DataFrame(data)

# build figure
fig=go.Figure(data=[go.Scatter(name='Modification',
                          x=df[df['Issue']=='Loan Modification']['Date'],
                          y=df[df['Issue']=='Loan Modification']['Sum']),
                   
                    go.Scatter(name='Servicing',
                          x=df[df['Issue']=='Loan Servicing']['Date'],
                          y=df[df['Issue']=='Loan Servicing']['Sum'])])

# show figure
fig.show()

Now I completely agree if you think the plot is a little weird. But it's because you've got two observations for Loan Servicing on 2012-12-16 in your data sample. You can sort that out by grouping your dataframe correctly before plotting.

Hope this helps! Don't hesitate to let me know if this doesn't work out for you!

Sign up to request clarification or add additional context in comments.

1 Comment

This is just what I'm looking for and even more! (hence the accepted answer) Thank you. Do you know by chance if there is a way I can plot without using the sum column (say just by count like a countplot would in seaborn)? If not, all good, thanks again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.