Pulling column values based on conditions

Question

I have the following dataframe

df = pd.DataFrame({
    'Column_1': ['Position', 'Start', 'End', 'Position'], 
    'Original_1': ['Open', 'Barn', 'Grass', 'Bubble'], 
    'Latest_1': ['Shut', 'Horn', 'Date', 'Dinner'], 
    'Column_2': ['Start', 'Position', 'End', 'During'], 
    'Original_2': ['Sky', 'Hold', 'Car', 'House'], 
    'Latest_2': ['Pedal', 'Lap', 'Two', 'Force'], 
    'Column_3': ['Start', 'End', 'Position', 'During'], 
    'Original_3': ['Leave', 'Dog', 'Block', 'Hope'], 
    'Latest_3': ['Sear', 'Crawl', 'Enter', 'Night']
})

For every instance where the word Position is in 'Column_1', 'Column_2', or 'Column_3', I want to capture the associated values in 'Original_1', 'Original_2', 'Original_3' and assign them to the new column named 'Original_Values'.

The following code can accomplish that, but only on a column by column basis.

df['Original_Value1'] = df.loc[df['Column_1'] == 'Position', 'Original_1']
df['Original_Value2'] = df.loc[df['Column_2'] == 'Position', 'Original_2']
df['Original_Value3'] = df.loc[df['Column_3'] == 'Position', 'Original_3']

Is there a way to recreate the above code so that it iterates over the entire data frame (not by specified columns)?

I'm hoping to create one column ('Original_values') with the following result:

0      Open
1      Hold
2     Block
3    Bubble
Name: Original_Values, dtype: object

I think this is the kind of question you need to explain why you are doing it. I think this questions lacks context to understand where you want to apply the code. — Anton vBR
– Anton vBR, Commented Jun 20, 2018 at 19:07

Greg · Accepted Answer · 2018-06-20 18:20:42Z

2

One way to do it, with df.apply():

def choose_orig(row):
    if row['Column_1'] == 'Position':
        return row['Original_1']
    elif row['Column_2'] == 'Position':
        return row['Original_2']
    elif row['Column_3'] == 'Position':
        return row['Original_3']
    return ''

df['Original_Values'] = df.apply(choose_orig, axis=1)

The axis=1 argument to df.apply() causes the choose_orig() function to be called once for each row of the dataframe.

Note that this uses a default value of the empty string, '', when none of the columns match the word 'Position'.

answered Jun 20, 2018 at 18:20

Greg

5,9452 gold badges20 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Anton vBR · Accepted Answer · 2018-06-20 19:13:49Z

2

How about creating a mask with the first 3 cols (or specify the name of them) and multiply it with the values in cols 6 to 9 (or specify the names of them). Then take max() value to remove nan.

df['Original_Values'] = ((df.iloc[:,:3] == 'Position') * df.iloc[:,6:9].values).max(1)

print(df['Original_values'])

Returns:

0      Open
1      Hold
2     Block
3    Bubble
Name: Original_Value, dtype: object

answered Jun 20, 2018 at 19:13

Anton vBR

19k6 gold badges47 silver badges47 bronze badges

Comments

ALollz · Accepted Answer · 2018-06-20 19:09:52Z

Here's a kinda silly way to do it with some stacking, which might perform better if you have a very large df and need to avoid axis=1.

Stack the first three columns to create a list of the index and which 'Original' column the value corresponds to
Stack the columns from which you want to get the values. Use the above list to reindex it, so you return the appropriate value.
Bring those values back to the original df based on the original row index.

Here's the code:

import re

mask_list = ['Column_1', 'Column_2', 'Column_3']
val_list = ['Original_1', 'Original_2', 'Original_3']

idx = df[mask_list].stack()[df[mask_list].stack() == 'Position'].index.tolist()
idx = [(x , re.sub('(.*_)', 'Original_', y)) for x, y in idx]

df['Original_Values'] = df[val_list].stack().reindex(idx).reset_index(level=1).drop(columns='level_1')

df is now:

   Column_1  Column_2  Column_3 ... Original_Values
0  Position     Start     Start ...            Open
1     Start  Position       End ...            Hold
2       End       End  Position ...           Block
3  Position    During    During ...          Bubble

If 'Position' is not found in any of the columns in mask_list, Original_Values becomes NaN for that row. If you need to scale it to more columns, simply add them to mask_list and val_list.

Collectives™ on Stack Overflow

Pulling column values based on conditions

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related