0

I have two dataframes A and B as follows :

A
col1 col2 col3
A     B    V1
A     B    V2
A     C    V1
A     E    V2

B
Col1 Col2 Value1 Value2
A     B   nan     nan
A     D   nan     nan
A     C   nan     nan
A     G   nan     nan
A     E   nan     nan

I want to update the columns Value1 and Value2 in dataframe B on the basis of dataframe A, as if the combination of Col1 and Col 2 of A exists in B it will update columns Value1 Value2 i.e. the values from col3 in dataframe A.

I want the output as :

Col1 Col2 Value1 Value2
A     B   V1      V2
A     D   nan     nan
A     C   V1      nan
A     G   nan     nan
A     E   nan     V2

I tried the following code in python :

def update_b():
for x in b.index:
    for y in a.index:          

            if ((a["col1"][y] == b["col1"][x]) & (a["col2"][y] == b["col2"][x])):

                if (a["col3"][y] == "V1"):
                    b["value1"][x] = "V1"
                else:
                    b["value2"][x] = "V2"

update_b()

but it gives me an error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

4
  • Can you first fix your indentation? Can't tell if I'm fixing the right thing, if it's unclear what you're actually running. Commented Jan 6, 2020 at 14:19
  • Can you also make your capitalisation consistent? Is it V1, or v1? Commented Jan 6, 2020 at 14:21
  • why does the last row has value for Value2? and not Value1 ? Commented Jan 6, 2020 at 14:49
  • @ignoring_gravitycorrected Commented Jan 6, 2020 at 15:02

3 Answers 3

1

Your dataframe(s) have bad design, you should revisit them, but to answer your initial question, we can leverage multindex & map

s = df.groupby(['col1','col2'])['col3'].agg(','.join).to_dict()
df2['Val'] = df2.set_index(['Col1','Col2']).index.map(s)
df2.loc[df2['Val'].str.contains(r'(v1)',case=False)==True,'Value1'] = 'V1'
df2.loc[df2['Val'].str.contains(r'(v2)',case=False)==True,'Value2'] = 'V2'
df2.pop('Val')

Result:

print(df2)
   Col1 Col2 Value1 Value2
0    A    B     V1     V2
1    A    D    NaN    NaN
2    A    C     V1    NaN
3    A    G    NaN    NaN
4    A    E    NaN     V2
Sign up to request clarification or add additional context in comments.

Comments

1

As far as I understood the logic, you may try the below function, Note, the last row is different from your expected output (I couldnt figure how is that possible).

def return_updated_B(df):
    m=A.rename(columns=lambda x:x.capitalize()) #capitalize cols of A
    n=m.set_index(['Col1','Col2',m.groupby(['Col1','Col2'])
           .cumcount().add(1)]).unstack().droplevel(0,axis=1).add_prefix('Value')
    return n.reindex(pd.MultiIndex.from_arrays((B['Col1'],B['Col2']))).reset_index()

print(return_updated_B(A))

  Col1 Col2 Value1 Value2
0    A    B     V1     V2
1    A    D    NaN    NaN
2    A    C     V1    NaN
3    A    G    NaN    NaN
4    A    E     V2    NaN

1 Comment

other than v2 which I'm unsure of tbh, this is a better general solution @OP.
0

You could try iterating over rows of the first dataframe and loc rows of the second one to assing values.

Something like the following:

for index, row in dfA.iterrows():
    dfB.loc[dfB.Col1 == row['col1'] & dfB.Col2 == row['col2'], "Value1"] = row['value1'] 

Writing answer from memory so there may be some syntax errors.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.