Filter Nulls when converting pandas dataframe to dict

Question

I have this pandas dataframe.

 technologies = [
 ("Spark", 22000,'30days',1000.0, 'Scala'),
         ("PySpark",25000,'50days',2300.0, 'Python'),
 ("Hadoop",23000,'55days',np.nan,np.nan)
 ]
 df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount', 'Language'])
 print(df)

   Courses    Fee Duration  Discount Language
0    Spark  22000   30days    1000.0    Scala
1  PySpark  25000   50days    2300.0   Python
2   Hadoop  23000   55days       NaN      NaN

I am interested to convert every row into a dict.

def convert_to_dict(row) -> dict:
    result = dict(row)
    final_result = {k:v for k, v in result.items() if v is not np.nan}
    print(final_result)

So i use the above function and this trick

df.apply(lambda row: convert_to_dict(row), axis=1)

But the result i get is weird.

{'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'Language': 'Scala'}
{'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'Language': 'Python'}
{'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days', 'Discount': nan}

The last row had Language and Discount both as Nan.

And i expected that both should have been filtered out but i see only Language is filtered out.

How do i filter out all columns from the final result which are nan to filter out please ?

jezrael · Accepted Answer · 2022-07-20 05:55:11Z

1

Use notna for filtering missing values:

final_result = {k:v for k, v in result.items() if pd.notna(v)}

final_result = [{k:v for k, v in result.items() if pd.notna(v)} 
                for result in df.to_dict('records')]
print(final_result)
[{'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'Language': 'Scala'}, 
 {'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'Language': 'Python'}, 
 {'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days'}]

answered Jul 20, 2022 at 5:55

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Himanshu Poddar Over a year ago

I wonder, why? it was anyway np.Nan why filtering on np.Nan did not work?

asfand hikmat Over a year ago

Hi Jezrael Thanks again for comming to my rescue indeed your solution rocks!!!

jezrael Over a year ago

@HimanshuPoddar - Hard question, maybe because NaN is some kind of special number (but should be string also)

Mahdi F. Over a year ago

@jezrael, second part of your answer is my answer ;)

jezrael Over a year ago

@I'mahdi - Not understand, your first answer stackoverflow.com/posts/73046575/revisions was incorrect, not removed missing values. I extend my solution in 3 minutes after my first post to this solution and was surprises after 5 minutes see same answer in your question. So only ask if see it was already posted.

|

Mahdi F. · Accepted Answer · 2022-07-20 06:05:46Z

1

You can use .to_dict('records') and filter nan with pandas.notna():

>>> [{k:v for k,v in dct.items() if pd.notna(v)} for dct in df.to_dict('records')]
[{'Courses': 'Spark',
  'Fee': 22000,
  'Duration': '30days',
  'Discount': 1000.0,
  'Language': 'Scala'},
 {'Courses': 'PySpark',
  'Fee': 25000,
  'Duration': '50days',
  'Discount': 2300.0,
  'Language': 'Python'},
 {'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days'}]

edited Jul 20, 2022 at 6:05

answered Jul 20, 2022 at 5:55

Mahdi F.

24.1k5 gold badges25 silver badges32 bronze badges

5 Comments

jezrael Over a year ago

It is same answer like mine.

Mahdi F. Over a year ago

@jezrael, No I see your answer, you add my answer after editing but under 5 min and other users can't see

Mahdi F. Over a year ago

@jezrael, why do you add second part of your answer.

jezrael Over a year ago

Because apply here is not necessary?

Mahdi F. Over a year ago

@jezrael, do I use apply?

G.G · Accepted Answer · 2023-03-02 09:26:48Z

1

df.apply(lambda ss:ss.loc[ss.notna()].to_dict(),axis=1).map(print)

out：

{'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'Language': 'Scala'}
{'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'Language': 'Python'}
{'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days'}

answered Mar 2, 2023 at 9:26

G.G

7654 silver badges5 bronze badges

Collectives™ on Stack Overflow

Filter Nulls when converting pandas dataframe to dict

3 Answers 3

6 Comments

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related