python pandas dataframe groupby put same group together

Question

I have a pandas data frame df:

df = pd.DataFrame(data = {'v1': ['b', 'b', 'c', 'a', 'd', 'c', 'd', 'c', 'f', 'e'], 
                          'v2': ['a', 'a', 'd', 'b', 'c', 'e', 'c', 'd', 'g','c'],
                          'v3': [3.3, 2.9, 3.5, 4.7, 5.1, 1.1, 2.3, 3.4, 4.7, 6.1]})
    v1  v2  v3
0   b   a   3.3
1   b   a   2.9
2   c   d   3.5 
3   a   b   4.7
4   d   c   5.1
5   c   e   1.1
6   d   c   2.3
7   c   d   3.4
8   f   g   4.7
9   e   c   6.1

I want to generate result looks like:

    v1  v2  v3
0   b   a   3.3
1   b   a   2.9
3   a   b   4.7
2   c   d   3.5 
4   d   c   5.1
6   d   c   2.3
7   c   d   3.4
5   c   e   1.1
9   e   c   6.1
8   f   g   4.7

i.e., stack rows with the same pairs of v1 and v2.

v1 = a, v2 = b and v1 = b, v2 = a is not equal(we cannot swap v1 and v2) but they are regarded as in the same group. How can I make it?

Any help appreciated!

cs95 · Accepted Answer · 2018-06-05 04:32:34Z

4

In similar fashion, use argsort:

df.iloc[pd.Series(
    [tuple(x) for x in np.sort(df.iloc[:, :2], axis=1)]).argsort()
]

  v1 v2   v3
0  b  a  3.3
1  b  a  2.9
3  a  b  4.7
2  c  d  3.5
4  d  c  5.1
6  d  c  2.3
7  c  d  3.4
5  c  e  1.1
9  e  c  6.1
8  f  g  4.7

answered Jun 5, 2018 at 4:32

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

cs95 Over a year ago

@jezrael Oh is it? Okay, give me a few minutes. I will edit my answer to include that information. It is an important case so I'll add it to my answer.

cs95 Over a year ago

@jezrael yeah your answer is fine, no need to delete in this case. I will add the link.

cs95 Over a year ago

@jezrael That's not how this site works. If you are using external context to answer a question, that means there is a problem and the question is incomplete lacking a minimal reproducible example and should be closed instead of mindlessly pandering for points. I don't get what is happening here but I really dislike it.

cs95 Over a year ago

@jezrael yeah but the problem was that OP did not even think we were worth his time to at least tell us that it was wrong. It was easier to send you a mail instead and ask you to answer and get points for it. I won't say it's wrong or unfair, just not sportsmanlike.

cs95 Over a year ago

Okay sure, whatever the case may be. I hope you understand where I'm coming from. The right thing to do would have been to ping me and tell me the error. I would have understood what was the problem and corrected it. Instead of pinging you in chat. This is because of the perception you have built over the years that no one else's answer is worth anything except yours. Again, hope you understand where I'm coming from

|

piRSquared · Accepted Answer · 2018-06-05 04:39:24Z

3

Option 1

df.loc[sorted(df.index, key=lambda i: tuple(sorted(df.loc[i, ['v1', 'v2']])))]

  v1 v2   v3
0  b  a  3.3
1  b  a  2.9
3  a  b  4.7
2  c  d  3.5
4  d  c  5.1
6  d  c  2.3
7  c  d  3.4
5  c  e  1.1
9  e  c  6.1
8  f  g  4.7

Option 2

Probably more intuitive but likely slower as well due to lots of apply.

df.reindex(df.apply(frozenset, 1).apply(tuple).sort_values().index)

  v1 v2   v3
0  b  a  3.3
1  b  a  2.9
3  a  b  4.7
2  c  d  3.5
4  d  c  5.1
6  d  c  2.3
7  c  d  3.4
5  c  e  1.1
9  e  c  6.1
8  f  g  4.7

Option 3

a = np.empty(len(df), object)
a[:] = list(map(sorted, zip(df.v1, df.v2)))
df.iloc[np.argsort(a)]

  v1 v2   v3
0  b  a  3.3
1  b  a  2.9
3  a  b  4.7
2  c  d  3.5
4  d  c  5.1
6  d  c  2.3
7  c  d  3.4
5  c  e  1.1
9  e  c  6.1
8  f  g  4.7

edited Jun 5, 2018 at 4:39

answered Jun 5, 2018 at 4:29

piRSquared

296k68 gold badges509 silver badges654 bronze badges

3 Comments

cs95 Over a year ago

I prefer the first, because of the lack of apply.

Hanbo Over a year ago

Awesome!! Thanks.

Hanbo Over a year ago

All correct! Option 3 is very fast. I test 10000 times using some real data (option 3: 2s, option 1: ~20s, I did not test option 2, the other solution: ~8s). Thanks much!

Collectives™ on Stack Overflow

python pandas dataframe groupby put same group together

2 Answers 2

7 Comments

Option 1

Option 2

Option 3

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

Option 1

Option 2

Option 3

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related