Graphically show correlation between columns of a pandas dataframe

Question

I have the following pandas dataframe covering more than 10k answers for 150 questions.

I am struggling to find a way to see the correlation between fields.

In particular I would like to understand how I can graphically show the correlation between Q015 and Q008, knowing that each respondent might have selected multiple answers (1,2,3).

So I am trying to figure out how to graphically display whether there is any correlation between Q015 and Q008 for each selected option of the survey.

Any ideas?

Samir Hinojosa · Accepted Answer · 2021-07-24 10:09:28Z

1

You can see a linear regression by Pearson

necessary libraries

import pandas as pd
import numpy as np
import math
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

Code

list_variables, list_COEF, list_MSE, list_RMSE, list_R2SCORE = ([] for i in range(5))
    
# initializing Linear Regression by Pearson
lr = LinearRegression()
xtrain, xtest, ytrain, ytest = train_test_split(df[["Q015"]], df[["Q008"]], test_size=0.3)
lr = LinearRegression()
lr_baseline = lr.fit(xtrain, ytrain)
pred_baseline = lr_baseline.predict(xtest)

list_variables.append("Q015 & Q008")
list_COEF.append(round(lr_baseline.coef_[0,0], 4))
list_MSE.append(round(mean_squared_error(ytest, pred_baseline), 2))
list_RMSE.append(round(math.sqrt(mean_squared_error(ytest, pred_baseline)), 2))
list_R2SCORE.append(round(r2_score(ytest, pred_baseline), 2))

# Plotting the graph
plt.figure(figsize=(12,8))
ax = plt.gca()

plt.suptitle("Q015 & Q008", fontsize=24, y=0.96)
plt.plot(xtest, ytest, 'bo', markersize = 5)
plt.plot(xtest, pred_baseline, color="red", linewidth = 2)
plt.xlabel("Q015", size=14)
plt.ylabel("Q008", size=14)
plt.tight_layout()
plt.show()

You will get something as follows where the column Coef. says to you how much the variables are correlated

Another way is to see the matrix correlation

df_corr = pd.DataFrame(df[["Q015", "Q008"]].corr()).round(2)
mask = np.zeros_like(df_corr, dtype=bool)
mask[np.triu_indices_from(mask)] = True 

plt.figure(figsize=(10,8))
plt.title("Pearson correlation between features", size=20)

ax = sns.heatmap(df_corr, mask=mask, vmin=-1, cmap="mako_r")

plt.xticks(rotation=25, size=14, horizontalalignment="right")
plt.yticks(rotation=0, size=14)
plt.tight_layout()
plt.show()

An example for numeric columns

df = pd.DataFrame(np.random.randint(0,15, size=(100, 6)), columns=[["Q01", "Q02", "Q03", "Q07", "Q015", "Q008"]])

edited Jul 24, 2021 at 10:09

answered Jul 24, 2021 at 9:00

Samir Hinojosa

8257 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Stefano Over a year ago

Many thanks Samir! I tried the first method, but I am stacked on the following point: for a, b in itertools.combinations(VARIABLES, 2): NameError: name 'VARIABLES' is not defined maybe I should declare VARIABLES as the options of column 15? (1,2,3,4,5)

Samir Hinojosa Over a year ago

Sorry, I copy and paste and example based on my code where I use the code with a dict of features. I delete the loop. for a, b in itertools.combinations(VARIABLES, 2): Try again and let me know.

Samir Hinojosa Over a year ago

Wait a minute, I recently noticed that one of the column allows multiple choices so, it is not the right approach for this situation. Sorry!!

Stefano Over a year ago

no problem Samir!!! Actually Thank you so much in advance!

Stefano Over a year ago

Indeed if I run the code for ---> xtrain, xtest, ytrain, ytest = train_test_split(df[["Q015"]], df[["Q008"]], test_size=0.3) I get the error: KeyError: "None of [Index(['Q015'], dtype='object', name=0)] are in the [columns]"

|

Collectives™ on Stack Overflow

Graphically show correlation between columns of a pandas dataframe

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related