0

I am currently working on a hobby project, but I am stuck on populating a DataFrame in pandas. I have three DataFrames. My problems:

  • For each id in DataFrame1, add column n to column x in DataFrame3 if column m is equal to 1.
  • For each id in DataFrame1 and DataFrame2 set column y to 1 if column c in DataFrame1 is equal to 1 or if column d in DataFrame2 is equal to 1. c equal to 99 has highest priority and sets y to 99

Can anyone please help me?

DataFrame1

    | id     | n    | m | c |
    |--------|------|---|---|
    | 577140 | bla1 | 0 | 0 |
    | 577140 | bla2 | 0 | 0 |
    | 577140 | bla3 | 0 | 0 |
    | 577140 | bla4 | 1 | 0 |
    | 577140 | bla5 | 0 | 1 |
    | 577141 | bla6 | 0 | 0 |
    | 577141 | bla7 | 0 | 0 |
    | 577141 | bla8 | 1 | 0 |

DataFrame2

    | id     | d |
    |--------|---|
    | 577140 | 1 |
    | 577141 | 0 |

DataFrame3 (currently)

    | id     |
    |--------|
    | 577140 |
    | 577141 |

DataFrame3 (needed)

    | id     | x    | y |
    |--------|------|---|
    | 577140 | bla4 | 1 |
    | 577141 | bla8 | 0 |
2
  • Do you have an attempt? Commented Apr 12, 2019 at 14:39
  • I tried some stuff with ‘’’apply’’’ but it didn’t work at all. Commented Apr 12, 2019 at 14:39

3 Answers 3

1

If I understand you correctly, you want to chain use DataFrame.merge twice to join all 3 dataframes and after that conditionally make the y column with np.select which we can pass multiple conditions

df_temp = pd.merge(df3, df1[df1.m == 1], on='id').merge(df2, on='id')

# Create column y with multiple conditions
conditions = [
    df_temp['c'] == 99,
    (df_temp['c'] == 1) | (df_temp['d'] == 1)
]

choices = [99, 1]

df_temp['y'] = np.select(conditions, choices, default=0)

# Select only columns we need for output
df_final = df_temp[['id', 'n', 'y']]

print(df_final)
       id     n  y
0  577140  bla4  1
1  577141  bla8  0
Sign up to request clarification or add additional context in comments.

3 Comments

I just realized I can have a 99 in column c as well. So column y should be set to 99 when c is equal to 99. How does this affect this code?
c=99 --> y=99, c=1 or d=1 ->y=1, else y=0
@dunkubok I edited the answer based on your new addition
0

Try using merge:

DataFrame3 = DataFrame3.merge(DataFrame1, how='left')
DataFrame3 = DataFrame3.merge(DataFrame2, how='left')
DataFrame3 = DataFrame3.rename(columns={'n': 'x'})
DataFrame3 = DataFrame3[DataFrame3['m'] == 1]
DataFrame3['y'] = (DataFrame3['c'] | DataFrame3['d'])
DataFrame3.drop(columns=['c', 'd', 'm'])

1 Comment

Please take a look at my comment under @Erfan's post
0

I used a set for the "if column c in DataFrame1 is equal to 1 or if column d is 1" logic:

columns = ['id', 'n', 'm', 'c']

df1=pd.DataFrame(
    [[577140, 'bla1', 0, 0],
    [577140, 'bla2', 0, 0],
    [577140, 'bla3', 0, 0],
    [577140, 'bla4', 1, 0],
    [577140, 'bla5', 0, 1],
    [577141, 'bla6', 0, 0],
    [577141, 'bla7', 0, 0],
    [577141, 'bla8', 1, 0]], columns=columns)

df3 = df1.loc[df1.m == 1, ['id', 'n']]
df3.columns = ['id', 'x']

df2 = pd.DataFrame([[577140, 1], [577141, 0]], columns=['id', 'd'])

id_set = set([df1[df1.c == 1]['id'].values[0], df2[df2.d == 1]['id'].values[0]])

df3['y'] = 0

df3.loc[df3.id.isin(id_set), 'y'] = 1

2 Comments

Please take a look at my comment under @Erfan's post
If "c" can take on multiple values, it's better to use the merge options since that will retain the value of c to give to y.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.