1

I have a pandas data frame df:

df = pd.DataFrame(data = {'v1': ['b', 'b', 'c', 'a', 'd', 'c', 'd', 'c', 'f', 'e'], 
                          'v2': ['a', 'a', 'd', 'b', 'c', 'e', 'c', 'd', 'g','c'],
                          'v3': [3.3, 2.9, 3.5, 4.7, 5.1, 1.1, 2.3, 3.4, 4.7, 6.1]})
    v1  v2  v3
0   b   a   3.3
1   b   a   2.9
2   c   d   3.5 
3   a   b   4.7
4   d   c   5.1
5   c   e   1.1
6   d   c   2.3
7   c   d   3.4
8   f   g   4.7
9   e   c   6.1

I want to generate result looks like:

    v1  v2  v3
0   b   a   3.3
1   b   a   2.9
3   a   b   4.7
2   c   d   3.5 
4   d   c   5.1
6   d   c   2.3
7   c   d   3.4
5   c   e   1.1
9   e   c   6.1
8   f   g   4.7

i.e., stack rows with the same pairs of v1 and v2.

v1 = a, v2 = b and v1 = b, v2 = a is not equal(we cannot swap v1 and v2) but they are regarded as in the same group. How can I make it?

Any help appreciated!

2 Answers 2

4

In similar fashion, use argsort:

df.iloc[pd.Series(
    [tuple(x) for x in np.sort(df.iloc[:, :2], axis=1)]).argsort()
]

  v1 v2   v3
0  b  a  3.3
1  b  a  2.9
3  a  b  4.7
2  c  d  3.5
4  d  c  5.1
6  d  c  2.3
7  c  d  3.4
5  c  e  1.1
9  e  c  6.1
8  f  g  4.7
Sign up to request clarification or add additional context in comments.

7 Comments

@jezrael Oh is it? Okay, give me a few minutes. I will edit my answer to include that information. It is an important case so I'll add it to my answer.
@jezrael yeah your answer is fine, no need to delete in this case. I will add the link.
@jezrael That's not how this site works. If you are using external context to answer a question, that means there is a problem and the question is incomplete lacking a minimal reproducible example and should be closed instead of mindlessly pandering for points. I don't get what is happening here but I really dislike it.
@jezrael yeah but the problem was that OP did not even think we were worth his time to at least tell us that it was wrong. It was easier to send you a mail instead and ask you to answer and get points for it. I won't say it's wrong or unfair, just not sportsmanlike.
Okay sure, whatever the case may be. I hope you understand where I'm coming from. The right thing to do would have been to ping me and tell me the error. I would have understood what was the problem and corrected it. Instead of pinging you in chat. This is because of the perception you have built over the years that no one else's answer is worth anything except yours. Again, hope you understand where I'm coming from
|
3

Option 1

df.loc[sorted(df.index, key=lambda i: tuple(sorted(df.loc[i, ['v1', 'v2']])))]

  v1 v2   v3
0  b  a  3.3
1  b  a  2.9
3  a  b  4.7
2  c  d  3.5
4  d  c  5.1
6  d  c  2.3
7  c  d  3.4
5  c  e  1.1
9  e  c  6.1
8  f  g  4.7

Option 2

Probably more intuitive but likely slower as well due to lots of apply.

df.reindex(df.apply(frozenset, 1).apply(tuple).sort_values().index)

  v1 v2   v3
0  b  a  3.3
1  b  a  2.9
3  a  b  4.7
2  c  d  3.5
4  d  c  5.1
6  d  c  2.3
7  c  d  3.4
5  c  e  1.1
9  e  c  6.1
8  f  g  4.7

Option 3

a = np.empty(len(df), object)
a[:] = list(map(sorted, zip(df.v1, df.v2)))
df.iloc[np.argsort(a)]

  v1 v2   v3
0  b  a  3.3
1  b  a  2.9
3  a  b  4.7
2  c  d  3.5
4  d  c  5.1
6  d  c  2.3
7  c  d  3.4
5  c  e  1.1
9  e  c  6.1
8  f  g  4.7

3 Comments

I prefer the first, because of the lack of apply.
Awesome!! Thanks.
All correct! Option 3 is very fast. I test 10000 times using some real data (option 3: 2s, option 1: ~20s, I did not test option 2, the other solution: ~8s). Thanks much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.