3

I have a pandas DataFrame like below

      NAME      EMAIL      HEIGHT      WEIGHT

1     jlka       NaN        170          70

2     qwer     eee@ttt      180          80

3     ioff       NaN        175          75

4     iowu     iou@add      170          60

And I want to replace NaN in 'EMAIL' column with random strings with no duplicates, which does not necessarily contain @.

I have tried to make a def which generates random strings but NaNs were replaced with the same random string since I used 'fillna' method after all.

It seems like, and as I saw other Q$As, def in fillna works only once and replace all the NaN with the same values or strings came out from the def.

Should I try 'for' sentence to replace them one by one?

Or is there a more Pythonic way to replace them?

1
  • I'm curious, what's the problem with using None when there isn't email? Commented Feb 20, 2020 at 9:38

2 Answers 2

3

you could try something like this:

import pandas as pd
from numpy import nan
import random
import string

df = pd.DataFrame({
    'Name': ['aaa','bbb','CCC'],
    'Email': [nan,'ddd',nan]})

def processNan (x):
    return ''.join(random.choice(string.ascii_uppercase + string.digits) for x in range(10))

df['Email'] = df['Email'].apply(lambda x: processNan(x) if x is nan else x)
Sign up to request clarification or add additional context in comments.

1 Comment

Sorry I am too new here. I marked it to your answer!
1

You could use pd.util.testing.rands_array, passing it the length of your desired string as the first (nchars) argument and the number of NaNs as the second (size) argument:

df.loc[df.EMAIL.isna(), "EMAIL"] = pd.util.testing.rands_array(10, sum(df.EMAIL.isnull()))      

>>> df                                                                                              

   NAME       EMAIL  HEIGHT  WEIGHT
1  jlka  YxzVaC38uw     170      70
2  qwer     eee@ttt     180      80
3  ioff  33kyDArtip     175      75
4  iowu     iou@add     170      60

pd.util.testing.rand_array could be replaced with any function that returns a list or array with a specific size.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.