3

I have a Pandas dataframe that looks as follows:

name1   country1    name2   country2
A       GER         B       USA
C       GER         E       GER
D       GER         Y       AUS
E       GER         A       USA

I want to get a new dataframe with two columns name and country that contains the unique pairs of (name1, country1) and (name2,country2).

The expected result should look like this:

name    country 
A       GER     
C       GER     
D       GER     
E       GER     
B       USA
A       USA
Y       AUS

I have found something similar for single columns here. However, I do not know how to transform this solution to my problem, that is, pairs of columns.

1 Answer 1

5

First filter columns by filter, transpose, flatten values and create new DataFrame by constructor:

a = df.filter(like='name').values.T.ravel()
b = df.filter(like='country').values.T.ravel()
df = pd.DataFrame({'name':a, 'country':b}, columns=['name','country'])
print (df)
  name country
0    A     GER
1    C     GER
2    D     GER
3    E     GER
4    B     USA
5    E     GER
6    Y     AUS
7    A     USA

Another solution with undocumented function lreshape:

df = pd.lreshape(df, {'name':['name1','name2'],
                      'country':['country1','country2']})
print (df)
  name country
0    A     GER
1    C     GER
2    D     GER
3    E     GER
4    B     USA
5    E     GER
6    Y     AUS
7    A     USA

And last for unique pairs use drop_duplicates:

df = df.drop_duplicates()
print (df)
  name country
0    A     GER
1    C     GER
2    D     GER
3    E     GER
4    B     USA
6    Y     AUS
7    A     USA
Sign up to request clarification or add additional context in comments.

2 Comments

which solution is better?
First is faster I think

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.