0

I have to get rid of duplicate rows on a *.xlsx file on a project. I have the code down here. But in the output file, date values turns into "yy-mm-dd hh:mm:ss" format after runnning my code. What would be the cause and solution to that wierd problem?

Running it on Pycharm 2019.2 Pro and Python 3.7.4

import pandas

mExcelFile = pandas.read_excel('Input/ogr.xlsx')
mExcelFile.drop_duplicates(subset=['FName', 'LName', 'Class', '_KDT'], inplace=True)
mExcelFile.to_excel('Output/NoDup.xlsx')

I'm expecting dates stay in original format which is "dd.mm.yy" but values become "yy-mm-dd hh:mm:ss"

2
  • 3
    Maybe you just have to make an Excel column wider? Probably Pandas turns date into datetime and it is wide to fit in regular column's width. It can be seen on your screenshot in formulas area. Commented Aug 31, 2019 at 13:06
  • thats embarrassingly true :( @crayxt Commented Aug 31, 2019 at 13:10

2 Answers 2

1

To control date format when writing to Excel, try this:

writer = pd.ExcelWriter(fileName, engine='xlsxwriter', datetime_format='dd/mm/yy')
df.to_excel(writer)
Sign up to request clarification or add additional context in comments.

2 Comments

Stil getting date formatted like this, 2018-01-01 00:00:00 insted of original format which is, 01.01.2018
With pandas=2.2.0, you should use the writer like this: with pd.ExcelWriter(path, datetime_format='dd/mm/yyyy') as writer: df.to_excel(writer)
0

Actually answer from the link below solved it. Since I am new to python programming I didn't realize where the problem was. It was actually pandas converting cellvalues to datetimes. Detailed answer : https://stackoverflow.com/a/49159393/11584604

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.