How to populate pandas DataFrame based on multiple columns and conditions?

Question

I am currently working on a hobby project, but I am stuck on populating a DataFrame in pandas. I have three DataFrames. My problems:

For each id in DataFrame1, add column n to column x in DataFrame3 if column m is equal to 1.
For each id in DataFrame1 and DataFrame2 set column y to 1 if column c in DataFrame1 is equal to 1 or if column d in DataFrame2 is equal to 1. c equal to 99 has highest priority and sets y to 99

Can anyone please help me?

DataFrame1

    | id     | n    | m | c |
    |--------|------|---|---|
    | 577140 | bla1 | 0 | 0 |
    | 577140 | bla2 | 0 | 0 |
    | 577140 | bla3 | 0 | 0 |
    | 577140 | bla4 | 1 | 0 |
    | 577140 | bla5 | 0 | 1 |
    | 577141 | bla6 | 0 | 0 |
    | 577141 | bla7 | 0 | 0 |
    | 577141 | bla8 | 1 | 0 |

DataFrame2

    | id     | d |
    |--------|---|
    | 577140 | 1 |
    | 577141 | 0 |

DataFrame3 (currently)

    | id     |
    |--------|
    | 577140 |
    | 577141 |

DataFrame3 (needed)

    | id     | x    | y |
    |--------|------|---|
    | 577140 | bla4 | 1 |
    | 577141 | bla8 | 0 |

I tried some stuff with ‘’’apply’’’ but it didn’t work at all. — konichiwa
– konichiwa, Commented Apr 12, 2019 at 14:39

Erfan · Accepted Answer · 2019-04-12 15:05:20Z

1

If I understand you correctly, you want to chain use DataFrame.merge twice to join all 3 dataframes and after that conditionally make the y column with np.select which we can pass multiple conditions

df_temp = pd.merge(df3, df1[df1.m == 1], on='id').merge(df2, on='id')

# Create column y with multiple conditions
conditions = [
    df_temp['c'] == 99,
    (df_temp['c'] == 1) | (df_temp['d'] == 1)
]

choices = [99, 1]

df_temp['y'] = np.select(conditions, choices, default=0)

# Select only columns we need for output
df_final = df_temp[['id', 'n', 'y']]

print(df_final)
       id     n  y
0  577140  bla4  1
1  577141  bla8  0

edited Apr 12, 2019 at 15:05

answered Apr 12, 2019 at 14:44

Erfan

43.4k10 gold badges76 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

konichiwa Over a year ago

I just realized I can have a 99 in column c as well. So column y should be set to 99 when c is equal to 99. How does this affect this code?

konichiwa Over a year ago

c=99 --> y=99, c=1 or d=1 ->y=1, else y=0

Erfan Over a year ago

@dunkubok I edited the answer based on your new addition

C. Braun · Accepted Answer · 2019-04-12 14:45:54Z

0

Try using merge:

DataFrame3 = DataFrame3.merge(DataFrame1, how='left')
DataFrame3 = DataFrame3.merge(DataFrame2, how='left')
DataFrame3 = DataFrame3.rename(columns={'n': 'x'})
DataFrame3 = DataFrame3[DataFrame3['m'] == 1]
DataFrame3['y'] = (DataFrame3['c'] | DataFrame3['d'])
DataFrame3.drop(columns=['c', 'd', 'm'])

answered Apr 12, 2019 at 14:45

C. Braun

5,2611 gold badge22 silver badges51 bronze badges

1 Comment

konichiwa Over a year ago

Please take a look at my comment under @Erfan's post

Sam · Accepted Answer · 2019-04-12 14:48:32Z

0

I used a set for the "if column c in DataFrame1 is equal to 1 or if column d is 1" logic:

columns = ['id', 'n', 'm', 'c']

df1=pd.DataFrame(
    [[577140, 'bla1', 0, 0],
    [577140, 'bla2', 0, 0],
    [577140, 'bla3', 0, 0],
    [577140, 'bla4', 1, 0],
    [577140, 'bla5', 0, 1],
    [577141, 'bla6', 0, 0],
    [577141, 'bla7', 0, 0],
    [577141, 'bla8', 1, 0]], columns=columns)

df3 = df1.loc[df1.m == 1, ['id', 'n']]
df3.columns = ['id', 'x']

df2 = pd.DataFrame([[577140, 1], [577141, 0]], columns=['id', 'd'])

id_set = set([df1[df1.c == 1]['id'].values[0], df2[df2.d == 1]['id'].values[0]])

df3['y'] = 0

df3.loc[df3.id.isin(id_set), 'y'] = 1

answered Apr 12, 2019 at 14:48

Sam

6111 gold badge4 silver badges10 bronze badges

2 Comments

konichiwa Over a year ago

Please take a look at my comment under @Erfan's post

Sam Over a year ago

If "c" can take on multiple values, it's better to use the merge options since that will retain the value of c to give to y.

Collectives™ on Stack Overflow

How to populate pandas DataFrame based on multiple columns and conditions?

3 Answers 3

3 Comments

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related