4

I'm working on a small code generation app that loads in an Excel file (using pandas ExcelFile + xlrd) which is then parsed to a dataframe (ExcelFile.parse) for several SQL-like operations. The stored data is then returned to a file writer as a list using map and lambda functions with a little f-string formatting on the specific fields.

The problem I'm having is that not all fields in the Excel file are predictably populated, so I'm using fillna('') during the parsing to dataframe, but when I come to the f-string, the unpopulated fields will cause an error when I apply :.0f formatting to remove the decimals. If I don't use the fillna('') function, the floats will format correctly, but I then have multiple entries of nan as a string value that I can't work out how to convert to ''.

As an example, the below will fail with fillna('') as NumField3 and NumField 4 can be empty in the source spreadsheet.

return list(
         map(
            lambda row: f"EXEC ***_****_*.****_Register_File("
            f"{row['NumField1']:.0f},{row['NumField2']:.0f},"
            f"'{row['TextField1']}','{row['TextField2']}',"
            f"'{row['TextField3']}','{row['TextField4']}',"
            f"{row['NumField3']:.0f},{row['NumField4']:.0f});\n",
            df.to_dict("records")))

My original approach was using .format() and itertuples(), but this was apparently a less efficient way. I've opted for the conversion to dictionary so I can retain the field names in the list construction for easier supportability.

I'm probably missing something really simple, but I can't see the wood for the trees at the moment. Any suggestions?

1 Answer 1

1

I think I've worked it out. I've removed the fillna('') from the parsing of the ExcelFile object to dataframe, which results in the NaN value being presented in unpopulated fields. When the dataframe records are eventually processed through the map lambda approach, the original NaN value is presented as the string 'nan', so I've included a re.sub to look for that value as a whole word and replace it with the required empty string.

It's not pretty but it works.

return list(
        re.sub(r'\bnan\b', '', i) for i in map(
            lambda row: f"EXEC ***_****_*.****_Register_File("
            f"{row['NumField1']:.0f},{row['NumField2']:.0f},"
            f"'{row['TextField1']}','{row['TextField2']}',"
            f"'{row['TextField3']}','{row['TextField4']}',"
            f"{row['NumField3']:.0f},{row['NumField4']:.0f});\n",
            df.to_dict("records")))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.