2

I import some data from a parquet file into a DataFrame and want to check the data types. One of the data types I expect is strings. To do this, I have something like the following:

import pandas as pd
col = pd.Series([None, 'b', 'c', None, 'e'])
assert((col.dtype == object) and (isinstance(col[0], str)))

But, as you can see, this does not work if I accidentally have a None value at the beginning.

Does anybody have an idea how to do that efficiently (preferably without having to check each element of the series)?

0

3 Answers 3

3

You can use first_valid_index to retrieve and check the first non-NA item:

isinstance(col.iloc[col.first_valid_index()], str)
Sign up to request clarification or add additional context in comments.

Comments

2

As of Pandas 1.0.0 there's a StringDtype, which you can use to check if the pd.Series contains only either NaN or string values:

try:
    col.astype('string')
except ValueError as e:
    raise e

If you try with a column containing an int:

col = pd.Series([None, 2, 'c', None, 'e'])

try:
    col.astype('string')
except ValueError as e:
    raise e

You'd get a ValueError:

ValueError: StringArray requires a sequence of strings or pandas.NA

2 Comments

I like that solution, but since my Series is quite long, I'd go with the spot-check proposed by @Stef
That depends on the purpose. If you have an object dtype and want to ensure the entire column contains no numerical data I'd go with this one. If that is not necessary the stef's appraoch should do @raphael
0

you can convert entire series all values to str type as follows:

col = col.astype(str)

None value will became string value.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.