0

I want to create a third column in my pandas dataframe that is based on cols 1 and 2. They are always matching, but I want to make it so that the third column takes whichever value is available. If I just go off of id1, sometimes it is blank, so the third col will end up being blank as well. I want it so that it will take whichever one isn't blank to create the college name.

Original:

    id1     id2            
0   ID01   ID01             
1          ID03            
2   ID07                   
3   ID08   ID08            

Desired:

    id1     id2            college_name
0   ID01   ID01             College1
1          ID03             College3
2   ID07                    College7
3   ID08   ID08             College8

Also, one thing about this data frame is that I am pretty sure the first 2 columns either are an exact match or one of them is empty. I would like to double-check if there is an instance where id1 and id2 are completely different numbers in the same row. How should I do that?

2

1 Answer 1

1

Backfill values from id2 to id1. Extract the numbers. Convert to int then str.

Given:

    id1   id2
0  ID01  ID01
1   NaN  ID03
2  ID07   NaN
3  ID08  ID08

Doing:

df['college_name'] = 'College' + (df.bfill(axis=1)['id1']
                                    .str.extract('(\d+)')
                                    .astype(int)
                                    .astype(str))

Output:

    id1   id2 college_name
0  ID01  ID01     College1
1   NaN  ID03     College3
2  ID07   NaN     College7
3  ID08  ID08     College8

To check for rows where the ids are different:

Given:

    id1   id2
0  ID01  ID01
1   NaN  ID03
2  ID07   NaN
3  ID08  ID98

Doing:

print(df[df.id1.ne(df.id2) & df.id1.notna() & df.id2.notna()])

Output:

    id1   id2
3  ID08  ID98
Sign up to request clarification or add additional context in comments.

3 Comments

Hi, I just edited my question to be more clear since you didn't really answer my question.
So I tried this, but it isn't working for me- I start off with no NA values, just empty values. Could this be the issue? When I try to add the na values, it doesn't work.
@youtube df = df.replace('', np.nan)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.