4

I have a dataframe with two rows

df = pd.DataFrame({'group' : ['c'] * 2,
                   'num_column': range(2),
                   'num_col_2': range(2),
                   'seq_col': [[1,2,3,4,5]] * 2,
                   'seq_col_2': [[1,2,3,4,5]] * 2,
                   'grp_count': [2]*2})

With 8 nulls, it looks like this:

df = df.append(pd.DataFrame({'group': group}, index=[0] * size))

  group  grp_count  num_col_2  num_column          seq_col        seq_col_2
0     c        2.0        0.0         0.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1     c        2.0        1.0         1.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN

What I want

Replace NaN values in sequences columns (seq_col, seq_col_2, seq_col_3 etc) with a list of my own.

Note: .

  • In this data there are 2 sequence column only but could be many more.
  • Cannot replace previous lists already in the columns, ONLY NaNs

I could not find solutions that replaces NaN with a user provided list value from a dictionary suppose.

Pseudo Code:

for each key, value in dict,
   for each column in df
       if column matches key in dict
         # here matches means the 'seq_col_n' key of dict matched the df 
         # column named 'seq_col_n'
         replace NaN with value in seq_col_n (which is a list of numbers)

I tried this code below, it works for the first column you pass then for the second column it doesn't. Which is weird.

 df.loc[df['seq_col'].isnull(),['seq_col']] = df.loc[df['seq_col'].isnull(),'seq_col'].apply(lambda m: fill_values['seq_col'])

The above works but then try again on seq_col_2, it will give weird results.

Expected Output: Given param input:

my_dict = {seq_col: [1,2,3], seq_col_2: [6,7,8]}

# after executing the code from pseudo code given, it should look like
 group  grp_count  num_col_2  num_column          seq_col        seq_col_2
0     c        2.0        0.0         0.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1     c        2.0        1.0         1.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
7
  • 1
    Can you show the expected output? Also, what results do you get with your code? Commented Jul 13, 2018 at 15:26
  • 1
    Nice, finally someone who posted at least an executable code example! Unluckily I can't help you, but I'll upvote your question therefore. But as Harv mentioned: An expected output would help alot. Commented Jul 13, 2018 at 15:28
  • Do you basically want to convert the 10 values in those 2 lists into 10 individual values for each row in those columns? If so, what would you want to do for the columns without lists? Commented Jul 13, 2018 at 15:38
  • link may help stackoverflow.com/questions/48197234/… Commented Jul 13, 2018 at 15:41
  • Is this what you're looking for? pandas.pydata.org/pandas-docs/version/0.22/generated/… Commented Jul 13, 2018 at 15:41

1 Answer 1

3

With input arrays, you can use pd.DataFrame.loc with pd.Series.isnull:

import pandas as pd, numpy as np

df = pd.DataFrame({'group' : ['c'] * 2,
                   'num_column': range(2),
                   'num_col_2': range(2),
                   'seq_col': [[1,2,3,4,5]] * 2,
                   'seq_col_2': [[1,2,3,4,5]] * 2,
                   'grp_count': [2]*2})

df = df.append(pd.DataFrame({'group': ['c']*8}, index=[0] * 8))

L1 = np.array([0, 1, 2, 3, 4, 5, 6, 7])
L2 = np.array([10, 11, 12, 13, 14, 15, 16, 17])

df.loc[df['seq_col'].isnull(), 'seq_col'] = L1
df.loc[df['seq_col_2'].isnull(), 'seq_col_2'] = L2

print(df[['seq_col', 'seq_col_2']])

           seq_col        seq_col_2
0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0                0               10
0                1               11
0                2               12
0                3               13
0                4               14
0                5               15
0                6               16
0                7               17

If you need list values in your series, then you can convert to a series explicitly before assignment:

df.loc[df['seq_col'].isnull(), 'seq_col'] = pd.Series([[1, 2, 3]]*len(df))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.