0

I have a dataframe with some demographic data and some survey text responses. I want to export each column of response data along with some demographic fields to a different Excel files based on one of the demographic fields. I have code that can do all that. The missing piece is dropping rows with nan when writing to Excel.

I tried creating separate dataframes for each question and dropping the nans there, which worked. Then I wasn't sure how to bring them back together to write to Excel.

# Sample dataframe
df = pd.DataFrame({'ID' : ['1','2','3','4'],
                   'School': ['School1', 'School1', 'School2', 'School2'], 
                   'Sex': ['M', 'M', 'F', 'F'],
                   'Q1' : ['Black', np.nan, 'White', 'White'],
                   'Q2' : ['Good', 'Good', 'Bad', 'Bad'],
                   'Q3' : ['Up', 'Up', np.nan, 'Down']})

# Create output
output = df[['ID','School','Sex','Q1','Q2','Q3']].groupby('School')

# Loop to write to Excel files
for school, df_ in output:
    writer = pd.ExcelWriter(f'school_{school}_tabs.xlsx', engine='xlsxwriter')
    df_[['School','Sex','Q1']].to_excel(writer, sheet_name='Q1')
    df_[['School','Sex','Q2']].to_excel(writer, sheet_name='Q2')
    df_[['School','Sex','Q3']].to_excel(writer, sheet_name='Q3')
    writer.save()

The sample code should create two Excel files, one for School1 and one for School2. Each file will have three tabs, one for each question (Q1, Q2, Q3). As you can see Q1 and Q3 have nan values, which get written as blanks to Excel. I don't want those rows to be written to Excel. Obviously those people answered other questions, which I do want written to Excel.

2
  • What about dropping rows with nan first to have sane data and then export it to Excel format? Commented Oct 9, 2019 at 21:52
  • Respondents may have answered one, two, or all three questions. So with those three columns together in the df (Q1, Q2, Q3) I don't want to drop rows with one or two nans because I want to keep the responses for the questions they did answer. Commented Oct 11, 2019 at 15:18

1 Answer 1

1

In your code, you need to use .dropna().

Eg: df_.dropna()

You will need to determine how in the dropna args.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html

Experiment with that argument and you should get what you want.

Sign up to request clarification or add additional context in comments.

2 Comments

I figured dropna would be involved, but I wasn't sure where to incorporate it. This worked: python # Loop to write to Excel files: for school, df_ in output: writer = pd.ExcelWriter(f'school_{school}_tabs.xlsx', engine='xlsxwriter') df1 = df_[['School','Sex','Q1']].dropna(subset=['Q1']) df1.to_excel(writer, sheet_name='Q1') writer.save() It feels like there's a more efficient way to do this, but I'll take what I can get at this point. Thanks!
this has nothing to do with dropna. worksheet.set_default_row(hide_unused_rows=True)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.