1

I have this pandas dataframe.

 technologies = [
 ("Spark", 22000,'30days',1000.0, 'Scala'),
         ("PySpark",25000,'50days',2300.0, 'Python'),
 ("Hadoop",23000,'55days',np.nan,np.nan)
 ]
 df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount', 'Language'])
 print(df)

   Courses    Fee Duration  Discount Language
0    Spark  22000   30days    1000.0    Scala
1  PySpark  25000   50days    2300.0   Python
2   Hadoop  23000   55days       NaN      NaN

I am interested to convert every row into a dict.

def convert_to_dict(row) -> dict:
    result = dict(row)
    final_result = {k:v for k, v in result.items() if v is not np.nan}
    print(final_result)

So i use the above function and this trick

df.apply(lambda row: convert_to_dict(row), axis=1)

But the result i get is weird.

{'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'Language': 'Scala'}
{'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'Language': 'Python'}
{'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days', 'Discount': nan}

The last row had Language and Discount both as Nan.

And i expected that both should have been filtered out but i see only Language is filtered out.

How do i filter out all columns from the final result which are nan to filter out please ?

3 Answers 3

1

Use notna for filtering missing values:

final_result = {k:v for k, v in result.items() if pd.notna(v)}

final_result = [{k:v for k, v in result.items() if pd.notna(v)} 
                for result in df.to_dict('records')]
print(final_result)
[{'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'Language': 'Scala'}, 
 {'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'Language': 'Python'}, 
 {'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days'}]
 
Sign up to request clarification or add additional context in comments.

6 Comments

I wonder, why? it was anyway np.Nan why filtering on np.Nan did not work?
Hi Jezrael Thanks again for comming to my rescue indeed your solution rocks!!!
@HimanshuPoddar - Hard question, maybe because NaN is some kind of special number (but should be string also)
@jezrael, second part of your answer is my answer ;)
@I'mahdi - Not understand, your first answer stackoverflow.com/posts/73046575/revisions was incorrect, not removed missing values. I extend my solution in 3 minutes after my first post to this solution and was surprises after 5 minutes see same answer in your question. So only ask if see it was already posted.
|
1

You can use .to_dict('records') and filter nan with pandas.notna():

>>> [{k:v for k,v in dct.items() if pd.notna(v)} for dct in df.to_dict('records')]
[{'Courses': 'Spark',
  'Fee': 22000,
  'Duration': '30days',
  'Discount': 1000.0,
  'Language': 'Scala'},
 {'Courses': 'PySpark',
  'Fee': 25000,
  'Duration': '50days',
  'Discount': 2300.0,
  'Language': 'Python'},
 {'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days'}]

5 Comments

It is same answer like mine.
@jezrael, No I see your answer, you add my answer after editing but under 5 min and other users can't see
@jezrael, why do you add second part of your answer.
Because apply here is not necessary?
@jezrael, do I use apply?
1
df.apply(lambda ss:ss.loc[ss.notna()].to_dict(),axis=1).map(print)

out:

{'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'Language': 'Scala'}
{'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'Language': 'Python'}
{'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days'}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.