1

I have an Excel files with a column of date-like string mixed with some illegal date-strings, just like: my column

I want to know how to use pandas to read it with exactly the same as original.

I don't want any date conversion.

I've tried many ways including

df = pd.read_excel(path)
df['Tran Date'] = df['Tran Date'].apply(lambda x: x.strftime('%m/%d/%y') if x else "")

I don't want to raise an Exception in the lambda since the data is millions rasing so much exception would cause severe performance problems.

So is there a way to treat the column as string and stop any date conversion?

2
  • Check this already posted same question stackoverflow.com/questions/58876706/… Commented Feb 21, 2020 at 11:55
  • @Raghavendra No, actually it is a completely different question. Commented Feb 21, 2020 at 12:02

1 Answer 1

1

When reading the Excel you can specify the dtype. So:

df = pd.read_excel('file',dtype={'Tran Date'='object'})
print(df.dtypes)
reg_date        object
dtype: object
print(df)
0      2020/1/18
1      2020/1/18
2      2020/1/18
3      2020/1/18
4      2020/1/18
5      2020/1/18
6      2020/1/18
7      2020/1/18
8      2020/1/18
9      2020/1/18
10    09/19/CN11
11    09/19/CN11
12    09/19/CN11
13    09/19/CN11
14    09/19/CN11
15    09/19/CN11
16    09/19/CN11
17    09/19/CN11

Added after comment

Something like this would do the work with the problem mentioned in comment although we are back to lambda, which you were trying to avoid:

df['Tran Date'] = df['Tran Date'].apply(lambda x: x.rstrip('00:00:00'))
df['Tran Date'] = df['Tran Date'].apply(lambda x: x.replace('-','/'))
df['Tran Date'] = df['Tran Date'].apply(lambda x: (x[-3:-1]+'/'+x[-6:-4]+'/'+x[0:-7]) if (x[2] != '/') else x)

0     18/01/2020
1     18/01/2020
2     18/01/2020
3     18/01/2020
4     18/01/2020
5     18/01/2020
6     18/01/2020
7     18/01/2020
8     18/01/2020
9     18/01/2020
10    09/19/CN11
11    09/19/CN11
12    09/19/CN11
13    09/19/CN11
14    09/19/CN11
15    09/19/CN11
16    09/19/CN11
17    09/19/CN11
Sign up to request clarification or add additional context in comments.

3 Comments

I'm not sure why it still converts to '2020-01-18 00:00:00'?
Well, it depends on the format you apply to those cells in Excel. You need to save the file as CSV and then import or save the dates as string in Excel. Otherwise, perform some kind of conversion in Pandas to remove hour, minutes, seconds
See edit in answer

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.