2

I have two data frames with 200 columns each. For illustration I am using only 3 columns here.

Dataframe df1 as:

            A   B   C
1/4/2017    5   6   6
1/5/2017    5   2   1
1/6/2017    6   2   10
1/9/2017    1   9   10
1/10/2017   6   6   4
1/11/2017   6   1   1
1/12/2017   1   7   10
1/13/2017   8   9   6

Dataframe df2:

            A   D   B
1/4/2017    8   10  2
1/5/2017    2   1   8
1/6/2017    6   6   6
1/9/2017    1   8   1
1/10/2017   10  6   2
1/11/2017   10  2   4
1/12/2017   5   4   10
1/13/2017   5   2   8

I want to calculate the following correlation matrix for corresponding columns of df1 and df2:

            A       B
1/4/2017        
1/5/2017        
1/6/2017    0.19    -0.94
1/9/2017    0.79    -0.96
1/10/2017   0.90    -0.97
1/11/2017   1.00    -1.00
1/12/2017   1.00    0.42
1/13/2017   0.24    0.84

i.e. using trailing 3 day historical data for same columns of df1 and df2, I need to find the correlation matrix.

so, I calculated corr([5, 5, 6], [8, 2, 6]) = 0.19 where [5,5,6] is from df1['A'] and [8,2,6] is from df2['A']

Since, I have 200 columns each I am finding it extremely cumbersome to run a for loop two times. First loop through columns and second using trailing 3 day lag data.

2 Answers 2

4

Is this what you need ?

l=[]
id=df1.columns.intersection(df2.columns)
for x in id:
    l.append(pd.rolling_corr(df1[x],df2[x],window=3))# notice you should change it to `l.append(df1[x].rolling(3).corr(df2[x]))`

pd.concat(l,axis=1)


Out[13]: 
                  A         B
1/4/2017        NaN       NaN
1/5/2017        NaN       NaN
1/6/2017   0.188982 -0.944911
1/9/2017   0.785714 -0.960769
1/10/2017  0.896258 -0.968620
1/11/2017  1.000000 -0.998906
1/12/2017  1.000000  0.423415
1/13/2017  0.240192  0.838628
Sign up to request clarification or add additional context in comments.

2 Comments

You might enjoy the second option I posted.
@piRSquared look into that , Always like to learn numpy method !!1
3

Option 1
I built a generator and wrapped it in pd.concat

def rolling_corrwith(d1, d2, window):
    d1, d2 = d1.align(d2, 'inner')
    for i in range(len(d1) - window + 1):
        j = i + window
        yield d1.iloc[i:j].corrwith(d2.iloc[i:j]).rename(d1.index[j-1])

pd.concat(list(rolling_corrwith(df1, df2, 3)), axis=1).T

                  A         B
1/6/2017   0.188982 -0.944911
1/9/2017   0.785714 -0.960769
1/10/2017  0.896258 -0.968620
1/11/2017  1.000000 -0.998906
1/12/2017  1.000000  0.423415
1/13/2017  0.240192  0.838628

Option 2
Using numpy strides. I don't recommend this approach. But it's worth mentioning for those who are interested.

from numpy.lib.stride_tricks import as_strided as strided

def sprp(v, w):
    s0, s1 = v.strides
    n, m = v.shape
    return strided(v, (n + 1 - w, w, m), (s0, s0, s1))

def rolling_corrwith2(d1, d2, window):
    d1, d2 = d1.align(d2, 'inner')

    s1 = sprp(d1.values, window)
    s2 = sprp(d2.values, window)

    m1 = s1.mean(1, keepdims=1)
    m2 = s2.mean(1, keepdims=1)
    z1 = s1.std(1)
    z2 = s2.std(1)

    c  = ((s1 - m1) * (s2 - m2)).sum(1) / z1 / z2 / window

    return pd.DataFrame(c, d1.index[window - 1:], d1.columns)

rolling_corrwith2(df1, df2, 3)

                  A         B
1/6/2017   0.188982 -0.944911
1/9/2017   0.785714 -0.960769
1/10/2017  0.896258 -0.968620
1/11/2017  1.000000 -0.998906
1/12/2017  1.000000  0.423415
1/13/2017  0.240192  0.838628

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.