Identify pandas dataframe columns containing both numeric and string

Question

I have created the following dataframe (called df):

d = {'ltv': [1, 22,45,78], 'age': [33, 43,54,65],'job': ['Salaried','Salaried','Salaried','Owner'], 'UniqueID' : ['A1','A2','A3','A4'] }
df = pd.DataFrame(data=d)

which looks like this:

print(df)

   ltv  age       job UniqueID
     1   33  Salaried       A1
    22   43  Salaried       A2
    45   54  Salaried       A3
    78   65     Owner       A4

I have checked its columns types:

print(df.info())

 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   ltv       4 non-null      int64 
 1   age       4 non-null      int64 
 2   job       4 non-null      object
 3   UniqueID  4 non-null      object

I only focus on the two object columns which are job and UniqueID. As you can see:

job contains only strings
UniqueID contains both strings and numbers

I want to be able to identify the column (in this case UniqueID) that contains both strings and numbers.

If I use the following code for UniqueID:

print(df['UniqueID'].str.isalnum())

0    True
1    True
2    True
3    True

I see that it returns True for all records, which is great. Now, if I use the same code for job, I get the same results:

print(df['job'].str.isalnum())

    0    True
    1    True
    2    True
    3    True

So, how can I identify in pandas which column that contains both strings and numbers (in this example: UniqueID)?

BENY · Accepted Answer · 2022-04-06 14:36:03Z

4

You can def your own function

def findchrandnum(x):
    try :
        return all(x.str.isalnum() & ~x.str.isalpha() & ~x.str.isdigit())
    except:
        return False
df.apply(findchrandnum)
Out[66]: 
ltv         False
age         False
job         False
UniqueID     True
dtype: bool

answered Apr 6, 2022 at 14:36

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

lemon · Accepted Answer · 2022-04-06 14:36:19Z

1

You can use the apply method to the column you want to check, to look for digits for each row. The sum will give you the number of values that have a digit in that column:

col = 'UniqueID'
df[col].apply(
    lambda val: any(ch.isdigit() for ch in val)
).sum()

If you know that your values in the columns are consistent, you can also check the first value only.

answered Apr 6, 2022 at 14:36

lemon

15.6k6 gold badges23 silver badges42 bronze badges

Collectives™ on Stack Overflow

Identify pandas dataframe columns containing both numeric and string

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related