0

Ideally it should be returning values between -1 and 1 for every cell except for the cells that have the same column name and row name those need to have a 1 value

Tried replacing the NaN with 0 before doing corr() and it returns proper values but those values are inaccurate for the purpose of the program

# df
            MovieA    MovieB    MovieC    MovieD  MovieE
Angee     0.000000       NaN -0.500000  0.500000     NaN
Anirvesh  1.166667 -0.333333 -0.833333       NaN     NaN
Jay       1.166667 -0.333333       NaN -0.833333     NaN
Karthik   0.000000 -1.500000       NaN       NaN     1.5
Naman          NaN  0.250000       NaN -0.250000     NaN

# df.T.corr()
          Angee  Anirvesh  Jay  Karthik  Naman
Angee       1.0       1.0 -1.0      NaN    NaN
Anirvesh    1.0       1.0  1.0      1.0    NaN
Jay        -1.0       1.0  1.0      1.0    1.0
Karthik     NaN       1.0  1.0      1.0    NaN
Naman       NaN       NaN  1.0      NaN    1.0

2 Answers 2

1

The NaNs are correct, they are returned when you cannot compute the correlation because of NaNs. This happens when you don't have at least two common values.

Filling the NaNs before computation indeed doesn't make sense as this will add fake datapoints that will be used to compute the correlation.

What you could do is fillna with 0 after the computation if you really don't want NaNs:

out = df.T.corr().fillna(0)

Output:

          Angee  Anirvesh  Jay  Karthik  Naman
Angee       1.0       1.0 -1.0      0.0    0.0
Anirvesh    1.0       1.0  1.0      1.0    0.0
Jay        -1.0       1.0  1.0      1.0    1.0
Karthik     0.0       1.0  1.0      1.0    0.0
Naman       0.0       0.0  1.0      0.0    1.0
Sign up to request clarification or add additional context in comments.

2 Comments

Also, no pair of users has more than 2 common values. If there are exactly 2 common values, then the correlation must be +1 or -1 (you can draw a straight line that exactly fits the values). So every element of the output is +1 or -1 or NaN.
@slothrop yes, that's a very valid point
1

Although your requirement is quite vague, you can still look at this.

import pandas as pd
import numpy as np


df = pd.DataFrame({'MovieA': [0, 1.166667, 1.166667, 0, np.nan],
                   'MovieB': [np.nan, -0.333333, -0.333333, -1.5, 0.25],
                   'MovieC': [-0.5, -0.833333, np.nan, np.nan, np.nan],
                   'MovieD': [0.5, np.nan, -0.833333, np.nan, -0.25],
                   'MovieE': [np.nan, np.nan, np.nan, 1.5, np.nan]},
                  index=['Angee', 'Anirvesh', 'Jay', 'Karthik', 'Naman'])
"""
print(df)
            MovieA    MovieB    MovieC    MovieD  MovieE
Angee     0.000000       NaN -0.500000  0.500000     NaN
Anirvesh  1.166667 -0.333333 -0.833333       NaN     NaN
Jay       1.166667 -0.333333       NaN -0.833333     NaN
Karthik   0.000000 -1.500000       NaN       NaN     1.5
Naman          NaN  0.250000       NaN -0.250000     NaN
"""

# Calculate correlations with Pearson's method
corr_matrix = df.T.corr(method='pearson')
"""print(corr_matrix)
          Angee  Anirvesh  Jay  Karthik  Naman
Angee       1.0       1.0 -1.0      NaN    NaN
Anirvesh    1.0       1.0  1.0      1.0    NaN
Jay        -1.0       1.0  1.0      1.0    1.0
Karthik     NaN       1.0  1.0      1.0    NaN
Naman       NaN       NaN  1.0      NaN    1.0
"""
# Fill diagonal with let's say 999.Just for understanding.
np.fill_diagonal(corr_matrix.values, 999)

"""
print(corr_matrix)
          Angee  Anirvesh    Jay  Karthik  Naman
Angee     999.0       1.0   -1.0      NaN    NaN
Anirvesh    1.0     999.0    1.0      1.0    NaN
Jay        -1.0       1.0  999.0      1.0    1.0
Karthik     NaN       1.0    1.0    999.0    NaN
Naman       NaN       NaN    1.0      NaN  999.0"""


# Concise Filling diagonal with let's say 888.Just for understanding.Also adding fillna(0)
corr_matrix = df.T.corr(method='pearson').where(~np.eye(len(df), dtype=bool), 888).fillna(0)
"""print(corr_matrix)

          Angee  Anirvesh    Jay  Karthik  Naman
Angee     888.0       1.0   -1.0      0.0    0.0
Anirvesh    1.0     888.0    1.0      1.0    0.0
Jay        -1.0       1.0  888.0      1.0    1.0
Karthik     0.0       1.0    1.0    888.0    0.0
Naman       0.0       0.0    1.0      0.0  888.0
"""

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.