I have a Pandas DataFrame as following. It shows how users have accessed pages p1 to p4 in each session.
df = pd.DataFrame([[1,1,1,0,1],[2,1,1,0,1],[3,1,1,1,1],[4,0,1,0,1]])
df.columns = ['session','p1','p2','p3','p4']
Following is the matrix which shows the intersection of pages accessed by common.
In [20]: df.dot(df.T)
Out[20]:
session 1 2 3 4
session
1 3 3 3 2
2 3 3 3 2
3 3 3 4 2
4 2 2 2 2
I need to calculate a value according to the following formula.
s1 = No of pages accessed in common/(total no of pages in si*total no of pages in sj)^(1/2)
That is for session 1 and 2
No of pages accessed in common = 3
total no of pages in s1*total no of pages in s2 = 3*3
s1 = 3/9^(1/2) = 1
for session 2 and 4
No of pages accessed in common = 2
total no of pages in s1*total no of pages in s2 = 3*2
s1 = 2/6^(1/2) = 0.8164
Couldn’t achieve this.