0

I have a df where I am trying to merge 2 columns based on a condition.

Create df

df = em_df[['Redcap_Case_num', 'EV_EM',  'COMP_EM', 'EV_RND', 'COMP_EM_RND'] ].head(3)
df.to_clipboard(excel = False, sep = ', ')

#Add EMFREAS - RND columns to df
cols_to_fill=[x for x in ln.columns if x.startswith("EMFREAS")]

for column in cols_to_fill:
    df[column] = ln[column].copy()
df.to_clipboard(excel = False, sep = ', ')

Output - Please understand I have tried to follow the instructions to format the table, but the instructions did not work for me. I'm not sure what I am doing wrong so this is how I was able to do it.

ID        EV_EM     COMP_EM          EV_RND      COMP_EM_RND     EMFREAS1   EMFREAS2 
YA007      1        Not Done                   Insufficient                                                                                
YA006      1                                                                                                                                              
YA005      0        Outside grid  EM Not done              

I need to merge the EV_RND column and the COMP_EM_RND columns to populate all the columns that start with EMFREAS (You are only seeing a subset of the columns)

Here is the code I am trying to use to do this:

#apply ND filter to df and merge to ln df

EV_ND = df["EV_EM"]==0 
EM_ND = df['COMP_EM'] == 'Not Done'

df.loc[EV_ND | EM_ND, cols_to_fill]=df["EV_RND"] + '|' + df["COMP_EM_RND"]                                                                                                                                                                                                    

The expected outcome should look like this:

ID       EV_EM COMP_EM          EV_RND      COMP_EM_RND     EMFREAS1            EMFREAS2 
YA007    1        Not Done      EV ND        Insufficient   Insufficient|EV ND Insufficient|EV ND                                                                          
YA006    1                                                                                                                                              
YA005    1        Outside grid  EM Not done                EM Not done           EM Not done      

1 Answer 1

1

If the empty values in your df are actually empty strings, you can create a separator series equal to | if EV_RND and COMP_EM_RND are not empty, empty string otherwise. Then concat EV_RND, the separator series and COMP_EM_RND:

sep_series = df.apply(lambda x: '|'
                        if (x['EV_RND'] and x['COMP_EM_RND'])
                        else '', axis=1)

fill_series = df['EV_RND'].str.cat(sep_series).str.cat(df['COMP_EM_RND'])

for col in df.columns:
    if col.startswith('EMFREAS'):
        df[col] = df[col].replace('', np.nan).fillna(fill_series)

Output:

      ID  EV_EM       COMP_EM       EV_RND   COMP_EM_RND            EMFREAS1            EMFREAS2
0  YA007      1      Not Done        EV ND  Insufficient  EV ND|Insufficient  EV ND|Insufficient
1  YA006      1                                                                                 
2  YA005      0  Outside grid  EM Not done                       EM Not done         EM Not done
Sign up to request clarification or add additional context in comments.

7 Comments

I tried this code and it worked, but it applied an empty string to all fields, which is not what I need. I have other data in these columns that can't be overwritten which is why I need to base it on a condition.
Oh now it makes more sense! In that case you'll want to use fillna. I edited my answer!
This doesn't work. This won't populate the data correctly, because it's not an AND condition. It's an OR plus I don't need it to overwrite data with an empty string. I need it to add (concatenate) all data in EV_RND or COMP_EM_RND in the EMFREAS columns. ( Only if the condition is met). I don't want it to overwrite data. I hope this makes sense.
Changed to an OR and it worked perfect! Thank you!!
I'm not sure I'm following you... the AND condition only affects the separator | (no need for separator if one of the values is an empty string). In my code I replace empty string with np.nan in order to apply fillna... Glad it's working though!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.