2

I have 2 large data frames, below 2 are just examples of how those 2 would look like.

df1 = pd.DataFrame(columns=['node', 'st1', 'st2'], data=[['a', 1, -1], ['b', 2, 2], ['c', 3, 4]])

node  st1  st2 
 a    1   -1
 b    2    2
 c    3    4

df2 = pd.DataFrame(columns=['node', 'st1', 'st2'], data=[['a', 8, 5], ['b', 4, 6]])

node  st1  st2
 a    8    5
 b    4    6

I want to update df1, st1 and st2, column values with the df2, st1 and st2, column values only if the node names in both data frames match. ALSO, if st1 or st2 columns values in df1 equals -1, do not update for that row and column i.e. keep it as -1. The result would look something like,

node  st1  st2
 a     8   -1
 b     4    6
 c     3    4

I've tried merging the 2 data frames using the basic pandas merge with left join which would give me a df with duplicate columns, then looping through each row in the resulting df to check the values of st1 and st2, and replace them only if it is not -1. But this would take a lot of time in larger data frames, which is why I would like to find the most effective way to do this.

0

3 Answers 3

3

You could set node as index in both dataframes, set to NaN all values except -1s and use DataFrame.combine_firstto fill NaNs in df1 with the values in df2 with shared index:

df = df1.set_index('node')
df.where(df.eq(-1)).combine_first(df2.set_index('node')).fillna(df)

      st1  st2
node          
a     8.0 -1.0
b     4.0  6.0
c     3.0  4.0
Sign up to request clarification or add additional context in comments.

2 Comments

This unnecessarily converts to float
Yes, because it converts to Nan at some point. Which is treated as a float. Can easily be converted to int
1

One way is to index where -1 appears and then go ahead and merge all data into df1 from df2. Then replace your -1 values (here I'm actually replacing the non -1 values with the new values). You'll need to set the index as node for this to work:

df1 = df1.set_index('node')
df2 = df2.set_index('node')

no_repl = df1 == -1
new_df = df2.combine_first(df1)
new_df = df1.where(no_repl, new_df).reset_index()

Same idea as @yatu's post really. Just slightly different syntax.

2 Comments

This is not merging on node. combine_first will use the indices to do so, but for that node has to be set to index
Good catch. Yes I forgot to set the index. Actually looking at your answer, it's pretty much the same idea +1.
0
df3 = df1.set_index('node')
df4 = df2.set_index('node')
keep_loc = (df3 == -1) | ~df3.index.isin(df4.index)[:, np.newaxis]
df3.where(keep_loc, df4)

      st1  st2
node          
a       8   -1
b       4    6
c       3    4

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.