Scatter-plot of Numeric vs. String data from Pandas dataframe

Question

I have a dataframe of the following form:

import pandas as pd
df = pd.DataFrame({'t': [0, 1, 2, 3, 4, 5, 6],
                   'l': [["c", "d"], ["a", "b"], ["c", "d"], ["a", "b"], ["c", "d"], ["c", "d"], ["c", "d"]]})

The column l consists of lists, where the list-entries can either be in the set {a,b,c,d}. I want to plot the contents of l for each value of t in the following manner which basically shows which of the four possible values {a,b,c,d} are acticated at a time t:

In order to create the above plot, what I did was to create the following dataframe based on df above (-1 is not activated, otherwise non-negative):

df_plot = pd.DataFrame({'t': [0, 1, 2, 3, 4,5,6],
                   'a': [-1, 0, -1, 0, -1,-1,-1],
                   'b': [-1, 1, -1, 1, -1,-1,-1],
                   'c': [2, -1, 2, -1, 2,2,2],
                   'd': [3, -1, 3, -1, 3,3,3]})

import numpy as np
ax = df_plot.plot(x="t", y=["a","b","c","d"],style='.', ylim=[-0.5,3.5], yticks=np.arange(0,3.1,1),legend=False)
labels = ["a","b","c","d"]
ax.set_yticklabels(labels)

This technically gives me what I want, however, I'd like to think that there is an easier and more professional way to plot this - is there a smarter way using one of Python's libraries?

So you want to know out of all combinations, which have been activated at the same time at some point, is that right? So {a,b},{c,d}. Or you need it for each point t? — yatu
– yatu, Commented Jan 30, 2019 at 9:37
@yatu For each point t I just want to mark which one of a,b,c or d have been activated. All possible combinations are possible, it is merely due to my laziness that the example above only has {a,b} and {c,d} — N08
– N08, Commented Jan 30, 2019 at 9:43
Well you will end up having some discrete representation in any case, your current solution seems fine to me. — yatu
– yatu, Commented Jan 30, 2019 at 9:51
@yatu Thanks - surprised there is no immediate way to do this automatically in any of Python's plotting libraries — N08
– N08, Commented Jan 30, 2019 at 9:52
@N08 If you are looking for something based just on Pandas check out my answer. — Federico Gentile
– Federico Gentile, Commented Jan 30, 2019 at 10:26

Federico Gentile · Accepted Answer · 2019-01-30 10:22:42Z

1

How about something like this:

# Reshape dataframe    
dff = df.l.apply(pd.Series).merge(df, right_index = True, left_index = True).drop(["l"], axis = 1).melt(id_vars = ['t'], value_name = "l").drop("variable", axis = 1)

# Plot dataframe
import matplotlib.pyplot as plt
plt.scatter(dff['t'], dff['l'])
# plt.grid(True)

More details about what is going on in the code i wrote can be found clicking this link : https://mikulskibartosz.name/how-to-split-a-list-inside-a-dataframe-cell-into-rows-in-pandas-9849d8ff2401

Note: it should work no matter how many items you have in the lists in column l.

edited Jan 30, 2019 at 10:22

answered Jan 30, 2019 at 10:14

Federico Gentile

5,99012 gold badges61 silver badges119 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Scatter-plot of Numeric vs. String data from Pandas dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related