8

I have a Pandas Dataframe like so:

id    cat1    cat2    cat3    num1    num2
1     0       WN      29      2003    98
2     1       TX      12      755     76
3     0       WY      11      845     32
4     1       IL      19      935     46

I want to find out the correlation between cat1 and column cat3, num1 and num2 or between cat1 and num1 and num2 or between cat2 and cat1, cat3, num1, num2

When I use df.corr() it gives Correlation between all the columns in the dataframe, but I want to see Correlation between just these selective columns detailed above.

How do I do that in Python pandas?

A Thousand thanks in advance for your answers.

1
  • 3
    df[['Cat1','cat3']].corr(), etc. Commented Feb 9, 2017 at 5:11

1 Answer 1

11

I tried the following and it worked :

features1=['cat1','cat2','cat3']
features2=['Cat1', 'Cat2','num1','num2']

df[features1].corr()
df[features2].corr()

A good way to select the columns based on the need when you have a very high number of variables in your dataset.

Sign up to request clarification or add additional context in comments.

1 Comment

You don't need to call list, your argument is already a list.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.