How to drop empty rows when using to_excel function in Python

Question

I have a dataframe with some demographic data and some survey text responses. I want to export each column of response data along with some demographic fields to a different Excel files based on one of the demographic fields. I have code that can do all that. The missing piece is dropping rows with nan when writing to Excel.

I tried creating separate dataframes for each question and dropping the nans there, which worked. Then I wasn't sure how to bring them back together to write to Excel.

# Sample dataframe
df = pd.DataFrame({'ID' : ['1','2','3','4'],
                   'School': ['School1', 'School1', 'School2', 'School2'], 
                   'Sex': ['M', 'M', 'F', 'F'],
                   'Q1' : ['Black', np.nan, 'White', 'White'],
                   'Q2' : ['Good', 'Good', 'Bad', 'Bad'],
                   'Q3' : ['Up', 'Up', np.nan, 'Down']})

# Create output
output = df[['ID','School','Sex','Q1','Q2','Q3']].groupby('School')

# Loop to write to Excel files
for school, df_ in output:
    writer = pd.ExcelWriter(f'school_{school}_tabs.xlsx', engine='xlsxwriter')
    df_[['School','Sex','Q1']].to_excel(writer, sheet_name='Q1')
    df_[['School','Sex','Q2']].to_excel(writer, sheet_name='Q2')
    df_[['School','Sex','Q3']].to_excel(writer, sheet_name='Q3')
    writer.save()

The sample code should create two Excel files, one for School1 and one for School2. Each file will have three tabs, one for each question (Q1, Q2, Q3). As you can see Q1 and Q3 have nan values, which get written as blanks to Excel. I don't want those rows to be written to Excel. Obviously those people answered other questions, which I do want written to Excel.

What about dropping rows with nan first to have sane data and then export it to Excel format? — jbndlr
– jbndlr, Commented Oct 9, 2019 at 21:52
Respondents may have answered one, two, or all three questions. So with those three columns together in the df (Q1, Q2, Q3) I don't want to drop rows with one or two nans because I want to keep the responses for the questions they did answer. — Paul
– Paul, Commented Oct 11, 2019 at 15:18

jason m · Accepted Answer · 2019-10-09 21:53:40Z

1

In your code, you need to use .dropna().

Eg: df_.dropna()

You will need to determine how in the dropna args.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html

Experiment with that argument and you should get what you want.

answered Oct 9, 2019 at 21:53

jason m

6,86322 gold badges77 silver badges126 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Paul Over a year ago

I figured dropna would be involved, but I wasn't sure where to incorporate it. This worked:

python # Loop to write to Excel files:  for school, df_ in output:     writer = pd.ExcelWriter(f'school_{school}_tabs.xlsx', engine='xlsxwriter')     df1 = df_[['School','Sex','Q1']].dropna(subset=['Q1'])     df1.to_excel(writer, sheet_name='Q1')     writer.save()

It feels like there's a more efficient way to do this, but I'll take what I can get at this point. Thanks!

gotiredofcoding Over a year ago

this has nothing to do with dropna. worksheet.set_default_row(hide_unused_rows=True)

Collectives™ on Stack Overflow

How to drop empty rows when using to_excel function in Python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related