Check for numeric value in text column - python

Question

5 columns (col1 - col5) in a 10-column dataframe (df) should be either blank or have text values only. If any row in these 5 columns has an all numeric value, i need to trigger an error. Wrote the following code to identify rows where the value is all-numeric in 'col1'. (I will cycle through all 5 columns using the same code):

    df2 = df[df['col1'].str.isnumeric()]

I get the following error: ValueError: cannot mask with array containing NA / NaN values

This is triggered because the blank values create NaNs instead of False. I see this when I created a list instead using the following:

    lst = df['col1'].str.isnumeric()

Any suggestions on how to solve this? Thanks

what error you want to trigger? or you want to replace the numeric values with something else ? — YOLO
– YOLO, Commented Feb 26, 2020 at 15:55
@YOLO This is a part of a bigger code, where I'm doing quality checks on data. In this case I write the error into a column 'Fail: {col1} is numeric'. I cannot use where and directly write this error into the column because the error column needs to record all errors - those found in other cols for this particular check and also for other checks conducted outside of the isnumeric() check. — SModi
– SModi, Commented Feb 26, 2020 at 16:01
pandas.pydata.org/pandas-docs/stable/reference/api/…. Blank strings create False. If the strings are themselves NaN, consider filling in ''. — ifly6
– ifly6, Commented Feb 26, 2020 at 16:04

Jano · Accepted Answer · 2020-02-26 16:02:51Z

1

Try this to work around the NaN

import pandas as pd

df = pd.DataFrame([{'col1':1}, {'col1': 'a'}, {'col1': None}])
lst = df['col1'].astype(str).str.isnumeric()
if lst.any():
    raise ValueError()

edited Feb 26, 2020 at 16:02

answered Feb 26, 2020 at 15:55

Jano

4452 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

SModi Over a year ago

your pre-edited code worked for me. I could on the fly replace the nans with text and so my dataframe was created. Haven't tried the revised code. This is what I finally used: 'df2 = df[df['col1'].astype(str).fillna('').str.isnumeric()]' I've marked your answer as the one that solved my question, but you may want to edit your response to also include your original response.

YOLO · Accepted Answer · 2020-02-26 16:20:12Z

0

Here's a way to do:

import string
df['flag'] = (df
             .applymap(lambda x: any(i for i in x if i in string.digits))
             .apply(lambda x: f'Fail: {",".join(df.columns[x].tolist())} is numeric', 1))

print(df)

   col1  col2                   flag
0     a  2.04  Fail: col2 is numeric
1  2.02     b  Fail: col1 is numeric
2     c     c      Fail:  is numeric
3     d     e      Fail:  is numeric

Explanation:

We iterate through each value of the dataframe and check if it is a digit and return a boolean value.
We use that boolean value to subset the column names

Sample Data

df = pd.DataFrame({'col1': ['a','2.02','c','d'],
                  'col2' : ['2.04','b','c','e']})

answered Feb 26, 2020 at 16:20

YOLO

22k5 gold badges25 silver badges42 bronze badges

2 Comments

SModi Over a year ago

haven't tested yet but this looks more efficient than what I finally used. I didn't use it as I cycle through 3 different checks for each column. Results from each check are written into the same flag col, depending on the existing value in the column: 1. If existing value == 'Pass' replace with 'Fail + {error message}' 2. else append with the additional fail. I don't see how to immediately implement within my current code structure, but think if I write the results from each check into separate columns and then merge, it may work. Will post here tomorrow if it does.

SModi Over a year ago

the above didn't work for me because I'm not checking ALL the columns in the dataframe. I get a list of columns to check from another dataframe. Is there a way to feed a list of column headers into your code?

Collectives™ on Stack Overflow

Check for numeric value in text column - python

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related