I am trying to retrieve the datatypes of the values in each cell of a pandas dataframe using the code below:
import pandas as pd
dfin = pd.read_csv(path, dtype=object)
d = {"<class 'datetime.datetime'>":'DateTime.Type',
"<class 'int'>": 'int',
"<class 'float'>": 'float',
"<class 'str'>": 'str'}
dftypes = df.applymap(type).astype(str).replace(d)
My dataframe contains mixed type columns and the 'dtype = object' parameter is intended to protect the types of cell values from being auto defined on a by column basis.
This code generates and maps the proper datatypes when the dfin is read from an xlsx file (pd.read_xlsx()), but not when read from a standard csv file (pd.read_csv()).
I want to be able to read in the data from a csv and then determine the datatypes cell by cell, but it only detects as str or null(float). Is there a fix here, or can you recommend another method to get this result?
Example:
Given dfin:
| Column A | Column B | Column C |
|---|---|---|
| 1.4 | 4 | NaN |
| 'yes' | 3.2 | 5 |
I want to return dftypes:
| Column A | Column B | Column C |
|---|---|---|
| float | int | float |
| str | float | int |
(works with read_xlsx())
With read_csv() the actual return is:
| Column A | Column B | Column C |
|---|---|---|
| str | str | float |
| str | str | str |