How to make a loop for multiple scatterplots in python?

Question

I am trying to automate the plotting procedure of a large dataframe matrix. The goal is to plot each column with an other column. Each column represents a variable. See also the image below.

F.e: sex vs age, sex vs BMI, sex vs smoke, sex vs type and so on.

For the sake of clearity, I have simplified the problem to image below: enter image description here

Initially, I tried to plot each combination by hand. But this is rather a time-consuming excersize and not what I want.

I tried also this (not working):

variables = ["Sex", "Age", "BMI"]
for variable in variables:
plt.scatter(df.variable, df.variable)
plt.xlabel('variable')
plt.ylabel('variable')
plt.title('variable vs. variable')
plt.show()

Any help is welcome!

PS: If it would be a simple excersize to incorporate a linear regression on the combination of variables as well, that would also be appreciated.

Greetings,

Nadia

Prune · Accepted Answer · 2019-07-10 20:47:34Z

3

What you coded plots each column against itself. What you described is a nested loop. A simple upgrade is

col_choice = ["Sex", "Age", "BMI"]

for pos, axis1 in enumerate(col_choice):   # Pick a first col
    for axis2 in enumerate(col_choice[pos+1:]):   # Pick a later col
        plt.scatter(df.loc[:, axis1], df.loc[:, axis2])

I think this generates a series acceptable to scatter.

Does that help? If you want to be more "Pythonic", then look into itertools.product to generate your column choices.

edited Jul 10, 2019 at 20:47

answered Jul 10, 2019 at 15:52

Prune

78k14 gold badges63 silver badges83 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Prune Over a year ago

Whoops! Sorry; you need to access the column named by the variable value axis1 ... you know how to do that?

Prune Over a year ago

That's because you specified a particular (constant) column, rather than taking the value from the axis variables, as my solutions specifies.

Nadia Merquez Over a year ago

Thanks, Prune. It worked. For the reader, one has to adjust for axis2, choosing the second column.

neko · Accepted Answer · 2019-07-10 16:03:07Z

3

You could do something like this:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Create dummy dataframe, or load your own with pd.read_csv()

columns = ["sex", "age", "BMI", "smoke", "type"]
data = pd.DataFrame(np.array([[1,0,0,1,0], [23,16,94,18,24], [32, 26, 28, 23, 19], [0,1,1,1,0], [1,2,2,2,1]]).T, columns=columns)


x_col = "sex"
y_columns = ["age", "BMI", "smoke"]


for y_col in y_columns:

    figure = plt.figure
    ax = plt.gca()
    ax.scatter(data[x_col], data[y_col])
    ax.set_xlabel(x_col)
    ax.set_ylabel(y_col)
    ax.set_title("{} vs {}".format(x_col, y_col))

    plt.legend()
    plt.show()

Basically, if you have your dataset saved as a .csv file, you can load it with pandas using pd.read_csv(), and use the column names as keys to access the corresponding rows, and iterate on that (here I created a dummy dataframe just for the sake of it).

Regarding the linear regression part, you should check out the scikit-learn library. It has a lot of regression models for many different tasks like regression, classification and clustering

answered Jul 10, 2019 at 16:03

neko

3891 silver badge5 bronze badges

1 Comment

Nadia Merquez Over a year ago

Thanks for your help neko. As for now, pycharm is not providing the plots for some reason...

Erick Oziel · Accepted Answer · 2023-03-29 17:58:38Z

0

You could use combinations from itertools. This way you will get an iterator with tuples of the combinations.

from itertools import combinations


print(list(combinations(df.columns, 2)))

The code you need would look like this:

from itertools import combinations


for col1, col2 in combinations(df.columns, 2): # <-----
    plt.scatter(df[col1], df[col2])
    plt.show()

answered Mar 29, 2023 at 17:58

Erick Oziel

1,2521 gold badge13 silver badges12 bronze badges

Collectives™ on Stack Overflow

How to make a loop for multiple scatterplots in python?

3 Answers 3

3 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related