0

I have a pandas dataframe : column header is called "Location" example contents: "London Arndale Centre" "Manchester Arndale" "Birmingham Central Station" "Newcastle Metro Centre"

2 numpy arrays :

originalLocation = np.array(["London Arndale Centre","Manchester Arndale","Birmingham Central Station","Newcastle Metro Centre")

newLocation = np.array(["London","Manchester","Birmingham","Newcastle"]

i want to create a new column in the pandas : newLocation

the result needs to be the matching column in newLocation, where the location field matches the original location numpy.

example : "London Arndale Centre" needs to be "London" "Manchester Arndale" needs to be "Manchester"

i have tried this , but it throw back errors

df['newLocation'] = newLocation[int(np.where(originalLocation == df['Location'])[0])]

errors : ValueError: ('Lengths must match to compare', (159,), (12,))

what am i doing wrong here ?

1 Answer 1

1

It seems like you forgot the commas in your originalLocation array. Also, the int() is not necessary. Updated code:

df_data = ["London Arndale Centre", "Manchester Arndale", "Birmingham Central Station", "Newcastle Metro Centre"]
df = pd.DataFrame(df_data, columns=['Location'])

originalLocation = np.array(["London Arndale Centre", "Manchester Arndale", "Birmingham Central Station", "Newcastle Metro Centre"])

newLocation = np.array(["London","Manchester","Birmingham","Newcastle"])      

df['newLocation'] = newLocation[np.where(originalLocation == df['Location'])[0]]

df

Output:

    Location    newLocation
0   London Arndale Centre   London
1   Manchester Arndale  Manchester
2   Birmingham Central Station  Birmingham
3   Newcastle Metro Centre  Newcastle

EDIT: As you mentioned merge works even if not all values are included in the new locations. I create a small example using merge:

df_data = ["London Arndale Centre", "Manchester Arndale", "Birmingham Central Station", "Newcastle Metro Centre"]
df = pd.DataFrame(df_data, columns=['Location'])

originalLocation = ["London Arndale Centre", "Birmingham Central Station", "Newcastle Metro Centre"]
newLocation = ["London", "Birmingham", "Newcastle"]   

df_new = pd.DataFrame({'Location': originalLocation,
                       'newLocation': newLocation})

df.merge(df_new, on='Location', how='left')

Output with Manchester entry missing:

Location    newLocation
0   London Arndale Centre   London
1   Manchester Arndale  NaN
2   Birmingham Central Station  Birmingham
3   Newcastle Metro Centre  
Sign up to request clarification or add additional context in comments.

4 Comments

thanks for your response. this does work. however, if i take out one of the items in df_data (eg "London Arndale Centre" and try the code i get the error message again "'Lengths must match to compare'" how can i get the code to match even if one of the items in the match list, does not exist at source?
merge works better for me with this.
You are right! merge works also with missing values. I added an example to the answer for completeness.
thank you for you help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.