Datatype detection cell by cell in dataframe

Question

I am trying to retrieve the datatypes of the values in each cell of a pandas dataframe using the code below:

import pandas as pd

dfin = pd.read_csv(path, dtype=object)

d = {"<class 'datetime.datetime'>":'DateTime.Type',
     "<class 'int'>": 'int',
     "<class 'float'>": 'float',
     "<class 'str'>": 'str'}

dftypes = df.applymap(type).astype(str).replace(d)

My dataframe contains mixed type columns and the 'dtype = object' parameter is intended to protect the types of cell values from being auto defined on a by column basis.

This code generates and maps the proper datatypes when the dfin is read from an xlsx file (pd.read_xlsx()), but not when read from a standard csv file (pd.read_csv()).

I want to be able to read in the data from a csv and then determine the datatypes cell by cell, but it only detects as str or null(float). Is there a fix here, or can you recommend another method to get this result?

Example:

Given dfin:

Column A	Column B	Column C
1.4	4	NaN
'yes'	3.2	5

I want to return dftypes:

Column A	Column B	Column C
float	int	float
str	float	int

(works with read_xlsx())

With read_csv() the actual return is:

Column A	Column B	Column C
str	str	float
str	str	str

Linden · Accepted Answer · 2022-03-24 10:04:05Z

1

Could you use a try, except block to try to convert the string to float, then int, and if it succeeds return 'float' or 'int', if not return 'str'?

e.g.

def get_data_type(value):
    try:
        float(value)
    except ValueError:
        return 'str'
    else:
        try:
            int(value)
            return 'int'
        except ValueError:
            return 'float'

dfin.applymap(get_data_type)

answered Mar 24, 2022 at 10:04

Linden

5714 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Thoughtful_Giraffe Over a year ago

thanks for the reply, this seems to work but I would add an edit casting 'value' as str -- ie. str(value)

Thoughtful_Giraffe Over a year ago

This solution also does not handle the case of datetime

Linden Over a year ago

I was just trying to account for the datatypes in your example, but you could add another try, except block with pd.to_datetime(value).

Thoughtful_Giraffe Over a year ago

Thats true, I marked this as the correct answer. I was able to make it work with some additional try-except blocks for other datatypes. Thanks

Collectives™ on Stack Overflow

Datatype detection cell by cell in dataframe

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related