Updating existing dataframe columns

Question

I have a data frame which has the structure as follows

code      value
1          red
2          blue 
3          yellow
1
4          
4          pink
2          blue

so basically i want to update the value column so that the blank rows are filled with values from other rows. So I know the code 4 refers to value pink, I want it to be updated in all the rows where that value is not present.

Possible duplicate of How can I replace all the NaN values with Zero's in a column of a pandas dataframe — Plasma
– Plasma, Commented Aug 29, 2018 at 15:57

user3483203 · Accepted Answer · 2018-08-29 16:01:16Z

4

Using groupby and ffill and bfill

df.groupby('code').value.ffill().bfill()

0       red
1      blue
2    yellow
3       red
4      pink
5      pink
6      blue
Name: value, dtype: object

answered Aug 29, 2018 at 16:01

user3483203

51.3k10 gold badges72 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

BENY Over a year ago

One thing .. df.groupby('code').value.apply(lambda x : x.ffill().bfill())

user3483203 Over a year ago

Nvm ignore that last comment, I think you may be right, I need to test this.

Zero · Accepted Answer · 2018-08-29 16:03:45Z

4

You could use first value of the given code group

In [379]: df.groupby('code')['value'].transform('first')
Out[379]:
0       red
1      blue
2    yellow
3       red
4      pink
5      pink
6      blue
Name: value, dtype: object

To assign back

In [380]: df.assign(value=df.groupby('code')['value'].transform('first'))
Out[380]:
   code   value
0     1     red
1     2    blue
2     3  yellow
3     1     red
4     4    pink
5     4    pink
6     2    blue

Or

df['value'] = df.groupby('code')['value'].transform('first')

answered Aug 29, 2018 at 16:03

Zero

77.4k22 gold badges154 silver badges154 bronze badges

Comments

sacuL · Accepted Answer · 2018-08-29 15:57:49Z

3

You can create a series of your code-value pairs, and use that to map:

my_map = df[df['value'].notnull()].set_index('code')['value'].drop_duplicates()

df['value'] = df['code'].map(my_map)

>>> df
   code   value
0     1     red
1     2    blue
2     3  yellow
3     1     red
4     4    pink
5     4    pink
6     2    blue

Just to see what is happening, you are passing the following series to map:

>>> my_map
code
1       red
2      blue
3    yellow
4      pink
Name: value, dtype: object

So it says: "Where you find 1, give the value red, where you find 2, give blue..."

answered Aug 29, 2018 at 15:57

sacuL

51.6k9 gold badges88 silver badges115 bronze badges

5 Comments

Zero Over a year ago

df.dropna().set_index('code')['value'] would do too.

sacuL Over a year ago

df.dropna().set_index('code')['value'].drop_duplicates(), because you still have to make sure there are no duplicate indices when you pass to map

user3483203 Over a year ago

@sacul you could use to_dict to remove the duplicates and map using the dictionary

Prachi Verma Over a year ago

I am still getting 2 rows for one of the code value pair, where one value is blank and other has the correct value. Is there a way to check if that field anything apart from Null, may be blank space? This is in the my_map

sacuL Over a year ago

you can try starting with df.replace({'':pd.np.nan, ' ':pd.np.nan}, inplace=True) to get rid of those cases from the start

jpp · Accepted Answer · 2018-08-29 16:02:52Z

2

You can sort_values, ffill and then sort_index. The last step may not be necessary if order is not important. If it is, then the double sort may be unreasonably expensive.

df = df.sort_values(['code', 'value']).ffill().sort_index()

print(df)

   code   value
0     1     red
1     2    blue
2     3  yellow
3     1     red
4     4    pink
5     4    pink
6     2    blue

answered Aug 29, 2018 at 16:02

jpp

166k37 gold badges301 silver badges363 bronze badges

Comments

BENY · Accepted Answer · 2018-08-29 16:04:22Z

2

Using reindex

df.dropna().drop_duplicates('code').set_index('code').reindex(df.code).reset_index()
Out[410]: 
   code   value
0     1     red
1     2    blue
2     3  yellow
3     1     red
4     4    pink
5     4    pink
6     2    blue

answered Aug 29, 2018 at 16:04

BENY

324k22 gold badges176 silver badges250 bronze badges

1 Comment

Prachi Verma Over a year ago

this works, but with one problem. The first value of code 4 is null, so the result has all the value column for row with code 4 update to null.

Collectives™ on Stack Overflow

Updating existing dataframe columns

5 Answers 5

2 Comments

Comments

5 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

5 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related