0

I have a data frame which has the structure as follows

code      value
1          red
2          blue 
3          yellow
1
4          
4          pink
2          blue

so basically i want to update the value column so that the blank rows are filled with values from other rows. So I know the code 4 refers to value pink, I want it to be updated in all the rows where that value is not present.

1

5 Answers 5

4

Using groupby and ffill and bfill

df.groupby('code').value.ffill().bfill()

0       red
1      blue
2    yellow
3       red
4      pink
5      pink
6      blue
Name: value, dtype: object
Sign up to request clarification or add additional context in comments.

2 Comments

One thing .. df.groupby('code').value.apply(lambda x : x.ffill().bfill())
Nvm ignore that last comment, I think you may be right, I need to test this.
4

You could use first value of the given code group

In [379]: df.groupby('code')['value'].transform('first')
Out[379]:
0       red
1      blue
2    yellow
3       red
4      pink
5      pink
6      blue
Name: value, dtype: object

To assign back

In [380]: df.assign(value=df.groupby('code')['value'].transform('first'))
Out[380]:
   code   value
0     1     red
1     2    blue
2     3  yellow
3     1     red
4     4    pink
5     4    pink
6     2    blue

Or

df['value'] = df.groupby('code')['value'].transform('first')

Comments

3

You can create a series of your code-value pairs, and use that to map:

my_map = df[df['value'].notnull()].set_index('code')['value'].drop_duplicates()

df['value'] = df['code'].map(my_map)

>>> df
   code   value
0     1     red
1     2    blue
2     3  yellow
3     1     red
4     4    pink
5     4    pink
6     2    blue

Just to see what is happening, you are passing the following series to map:

>>> my_map
code
1       red
2      blue
3    yellow
4      pink
Name: value, dtype: object

So it says: "Where you find 1, give the value red, where you find 2, give blue..."

5 Comments

df.dropna().set_index('code')['value'] would do too.
df.dropna().set_index('code')['value'].drop_duplicates(), because you still have to make sure there are no duplicate indices when you pass to map
@sacul you could use to_dict to remove the duplicates and map using the dictionary
I am still getting 2 rows for one of the code value pair, where one value is blank and other has the correct value. Is there a way to check if that field anything apart from Null, may be blank space? This is in the my_map
you can try starting with df.replace({'':pd.np.nan, ' ':pd.np.nan}, inplace=True) to get rid of those cases from the start
2

You can sort_values, ffill and then sort_index. The last step may not be necessary if order is not important. If it is, then the double sort may be unreasonably expensive.

df = df.sort_values(['code', 'value']).ffill().sort_index()

print(df)

   code   value
0     1     red
1     2    blue
2     3  yellow
3     1     red
4     4    pink
5     4    pink
6     2    blue

Comments

2

Using reindex

df.dropna().drop_duplicates('code').set_index('code').reindex(df.code).reset_index()
Out[410]: 
   code   value
0     1     red
1     2    blue
2     3  yellow
3     1     red
4     4    pink
5     4    pink
6     2    blue

1 Comment

this works, but with one problem. The first value of code 4 is null, so the result has all the value column for row with code 4 update to null.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.