I am trying to add multiple columns to a dataframe with numpy.where() in an ETL logic.
This is my df:
I am trying to get my df as:
And the code is:
current_time = pd.Timestamp.utcnow().strftime('%Y-%m-%d %H:%M:%S')
df = pd.concat(
[
df,
pd.DataFrame(
[
np.where(
# When old hash code is available and new hash code is not available. 0 -- N
(
df['new_hash'].isna()
&
~df['old_hash'].isna()
) |
# When hash codes are available and matched. 3.1 -- 'N'
(
~df['new_hash'].isna()
&
~df['old_hash'].isna()
&
~(df['new_hash'].ne(df['old_hash']))
),
['N', df['cr_date'], df['up_date']],
np.where(
# When new hash code is available and old hash code is not available. 1 -- Y
(
~df['new_hash'].isna()
&
df['old_hash'].isna()
),
['Y', current_time, current_time],
np.where(
# When hash codes are available and matched. 3.2 -- 'Y'
(
~df['new_hash'].isna()
&
~df['old_hash'].isna()
&
df['new_hash'].ne(df['old_hash'])
),
['Y', df['cr_date'], current_time],
['N', df['cr_date'], df['up_date']]
)
)
)
],
index=df.index,
columns=['is_changed', 'cr_date_new', 'up_date_new']
)
],
axis=1
)
Tried above code with df.join() instead of pd.concat(). Still giving me below specified ValueError
I am able add one column at a time. and the example is:
df['is_changed'] = (
np.where(
# When old hash code is available and new hash code is not available. 0 -- N
(
df['new_hash'].isna()
&
~df['old_hash'].isna()
) |
# When hash codes are available and matched. 3.1 -- 'N'
(
~df['new_hash'].isna()
&
~df['old_hash'].isna()
&
~(df['new_hash'].ne(df['old_hash']))
),
'N',
np.where(
# When new hash code is available and old hash code is not available. 1 -- Y
(
~df['new_hash'].isna()
&
df['old_hash'].isna()
),
'Y',
np.where(
# When hash codes are available and matched. 3.2 -- 'Y'
(
~df['new_hash'].isna()
&
~df['old_hash'].isna()
&
df['new_hash'].ne(df['old_hash'])
),
'Y',
'N'
)
)
)
)
But getting error (ValueError: operands could not be broadcast together with shapes (66,) (3,) (3,)) with multiple columns.
what is the wrong with adding multiple columns? Can someone help me in this?


df, so I don;t know whatconcatis working on. This is not a minimal reproducible example and I strongly suggest that you change the formatting style