0

I have a Dataframe with two columns "Start_location" and "end_location". I want to create a new column called "location" from the 2 previous columns with the following conditions.

If the values of "start_location" == "end_location", then the value of "location" will be either of the values of the first two columns. else, if the values of of "start_location" and "end_location are different, then values of "Location" will be "start_location"-"end_location".

An example of what I want is this.

+---+--------------------+-----------------------+
|   |  Start_location    |      End_location     |
+---+--------------------+-----------------------+
| 1 | Stratford          |      Stratford        |
| 2 | Bromley            |      Stratford        |
| 3 | Brighton           |      Manchester       |
| 4 | Delaware           |      Delaware         |
+---+--------------------+-----------------------+
   

The result I want is this.

+---+--------------------+-----------------------+--------------------+
|   |  Start_location    |      End_location     |   Location         |
+---+--------------------+-----------------------+--------------------+
| 1 | Stratford          |      Stratford        |   Stratford        |
| 2 | Bromley            |      Stratford        | Brombley-Stratford |
| 3 | Brighton           |      Manchester       | Brighton-Manchester|
| 4 | Delaware           |      Delaware         |    Delaware        |
+---+--------------------+-----------------------+--------------------+
   

I would be happy if anyone can help.

PS- forgive me if this is a very basic question. I have gone through some similar questions on this topic but couldn't get a headway.

4 Answers 4

2

You can make your own function that does this and then use apply and a lambda function:

def get_location(start, end):
    if start == end:
        return start
    else:
        return start + ' - ' + end

df['location'] = df.apply(lambda x: get_location(x.Start_location, x.End_location), axis = 1)
Sign up to request clarification or add additional context in comments.

Comments

1
df['Location'] = df[['start_location','end_location']].apply(lambda x: x[0] if x[0] == x[1] else x[0] + '-' + x[1], axis = 1)

Comments

1

You can use Numpy to compare both columns. Follow This code


import numpy as np

df["Location"] =  np.where((df['Start_location'] == df['End_location'])
                           , df['Start_location'],df['Start_location']+"-"+ df['End_location'])

df

Output:

    Start_location  End_location    Location
0   Stratford        Stratford      Stratford
1   Bromley          Stratford  Bromley-Stratford
2   Brighton         Manchester Brighton-Manchester
3   Delaware         Delaware        Delaware

Comments

1

Use np.select(condition, choice). To join start, use .str.cat() method

import numpy as np

condition=[df['Start_location']==df['End_location'],df['Start_location']!= df['End_location']]
choice=[df['Start_location'], df['Start_location'].str.cat(df['End_location'], sep='_')]
df['Location']=np.select(condition, choice)

df
Start_location End_location             Location
1      Stratford    Stratford            Stratford
2        Bromley    Stratford    Bromley_Stratford
3       Brighton   Manchester  Brighton_Manchester
4       Delaware     Delaware             Delaware

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.