3

I need to copy a column's field into a variable, based on a specific condition, and then delete it.

This dataframe contains data of some kids, that have their favourite toy and colour associated:

data = {'Kid': ['Richard', 'Daphne', 'Andy', 'May', 'Claire', 'Mozart', 'Jane'],
        'Toy':  ['Ball', 'Doll', 'Car', 'Barbie', 'Frog', 'Bear', 'Doll'],
        'Colour': ['white', np.nan, 'red', 'pink', 'green', np.nan, np.nan]
        }

df = pd.DataFrame (data, columns = ['Kid', 'Toy','Colour'])

print (df)

The dataframe looks like this:

       Kid       Toy Colour
0  Richard      Ball  white
1   Daphne      Doll    NaN
2     Andy       Car    red
3      May    Barbie   pink
4   Claire      Frog  green
5   Mozart      Bear    NaN
6     Jane      Doll    NaN

The condition is: If a kid does have a toy, but it does not have a colour, then save both the kid and the toy in a separate array as follows and maintain the order/matching:

toy_array = ["Doll", "Bear", "Doll"]
kid_array = ["Daphne", "Mozart", "Jane"]

And then delete the toy from the dataframe. So the final dataframe should look like this:

       Kid     Toy Colour
0  Richard    Ball  white
1   Daphne     NaN    NaN
2     Andy     Car    red
3      May  Barbie   pink
4   Claire    Frog  green
5   Mozart     NaN    NaN
6     Jane     NaN    NaN

I got inspired by many sources, along with this one, and I tried this:

kid_array.append(df.loc[(df['Toy'] != np.nan) & (df['Colour'] == np.nan)])
print(kid_array)

I am at the very beginning, I highly appreciate all your help if you could possibly help me!

2
  • Wau, so nice question - input data sample, ouput data sample, your code what you try. Super! Happy coding! Commented Mar 30, 2021 at 11:47
  • 1
    Thank you very much, @jezrael ! Happy coding! Commented Mar 30, 2021 at 12:05

2 Answers 2

1

Test missing and no misisng values by Series.isna and Series.notna and then set missing values to Toy column by DataFrame.loc:

mask = df['Toy'].notna() & df['Colour'].isna()

df.loc[mask, 'Toy'] = np.nan

Or in Series.mask:

df['Toy'] = df['Toy'].mask(mask)

Or by numpy.where:

df['Toy'] = np.where(mask, np.nan, df['Toy'])

print (df)
       Kid     Toy Colour
0  Richard    Ball  white
1   Daphne     NaN    NaN
2     Andy     Car    red
3      May  Barbie   pink
4   Claire    Frog  green
5   Mozart     NaN    NaN
6     Jane     NaN    NaN

If need lists:

toy_array = df.loc[mask, 'Toy'].tolist()
kid_array = df.loc[mask, 'Kid'].tolist()

print (toy_array)
['Doll', 'Bear', 'Doll']

print (kid_array)
['Daphne', 'Mozart', 'Jane']
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much!! I learned so much from your answer!
0

Your logic is correct, just the function to compare needs to be matched with the correct function used for comparison in Numpy Library

numpy.isnan()

Try the following code

kid_array.append(df.loc[(!numpy.isnan( df['Toy'])) & (!numpy.isnan(df['Colour']))])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.