3

I have this dataframe:

source target
0     ape    dog
1     ape   hous
2     dog   hous
3    hors    dog
4    hors    ape
5     dog    ape
6     ape   bird
7     ape   hous
8    bird   hous
9    bird   fist
10   bird    ape
11   fist    ape

I am trying to generate a frequency count with this code:

df_count =df.groupby(['source', 'target']).size().reset_index().sort_values(0, ascending=False)
df_count.columns = ['source', 'target', 'weight']

I get the result below.

source target  weight
2     ape   hous       2
0     ape   bird       1
1     ape    dog       1
3    bird    ape       1
4    bird   fist       1
5    bird   hous       1
6     dog    ape       1
7     dog   hous       1
8    fist    ape       1
9    hors    ape       1
10   hors    dog       1

How can I modify the code so that direction does not matter, i.e. that instead of ape bird 1 and bird ape 1, i get ape bird 2?

1
  • 1
    The easiest way would be to sort your rows so that only one order ever occurs. Commented Dec 11, 2016 at 8:02

2 Answers 2

5

First sort the values row-wise.

In [31]: df
Out[31]:
   source target
0     ape    dog
1     ape   hous
2     dog   hous
3    hors    dog
4    hors    ape
5     dog    ape
6     ape   bird
7     ape   hous
8    bird   hous
9    bird   fist
10   bird    ape
11   fist    ape

In [32]: df.values.sort()

In [33]: df
Out[33]:
   source target
0     ape    dog
1     ape   hous
2     dog   hous
3     dog   hors
4     ape   hors
5     ape    dog
6     ape   bird
7     ape   hous
8    bird   hous
9    bird   fist
10    ape   bird
11    ape   fist

Then,groupby on source, target, aggregate by size, sort the result.

In [34]: df.groupby(['source', 'target']).size().sort_values(ascending=False)
    ...:   .reset_index(name='weight')
Out[34]:
  source target  weight
0    ape   hous       2
1    ape    dog       2
2    ape   bird       2
3    dog   hous       1
4    dog   hors       1
5   bird   hous       1
6   bird   fist       1
7    ape   hors       1
8    ape   fist       1
Sign up to request clarification or add additional context in comments.

Comments

4

You can first sort by rows by apply and then add parameter name to reset_index:

df_count = df.apply(sorted, axis=1) \
             .groupby(['source', 'target']) \
             .size() \
             .reset_index(name='weight') \
             .sort_values('weight', ascending=False)
print (df_count)
  source target  weight
0    ape   bird       2
1    ape    dog       2
4    ape   hous       2
2    ape   fist       1
3    ape   hors       1
5   bird   fist       1
6   bird   hous       1
7    dog   hors       1
8    dog   hous       1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.