Identifying only numeric values from a column in a Data Frame- Python

Question

I Want a Separate column which returns "Yes" if the column "ID" contains all numeric values and 'No' if it contains alphabets or alphanumeric values.

ID      Result
3965      Yes
wyq8      No
RO_123    No
CMD_      No
2976      Yes

Ch3steR · Accepted Answer · 2020-10-27 07:29:46Z

You can use pd.Series.str.isnumeric here.

df['Result'] = np.where(df['ID'].str.isnumeric(), 'YES', 'NO')

       ID Result
0    3965    YES
1    wyq8     NO
2  RO_123     NO
3    CMD_     NO
4    2976    YES

There's a caveat with using isnumeric it doesn't identify float numbers.

test = pd.Series(["9.0", "9"])
test.str.isnumeric()

0    False
1     True
dtype: bool

If you strictly mark YES for int then use isnumeric else you can use pd.Series.str.fullmatch(available from version 1.1.0) here.

df['Result'] = np.where(df['ID'].str.fullmatch(r"\d+|\d+\.\d+", 'YES', 'NO')

For version <1.1.0 you use re.fullmatch

numeric_pat = re.compile(r"\d+|\d+\.\d+")
def numeric(val):
    match = numeric_pat.fullmatch(val)
    if match: return 'YES'
    else: return 'NO'

df['Result'] = df['ID'].apply(numeric)

Or we can use pd.to_numeric with boolean masking using pd.Series.isna

m = pd.to_numeric(df['ID'], errors='coerce').isna()
df['Result'] = np.where(m, 'NO', 'YES')

With errors parameter set to 'coerce' values which cannot be turned into numeic value will set to Nan.

test = pd.Series(['3965', 'wyq8', 'RO_123', 'CMD_', '2976'])
pd.to_numeric(test)

0    3965.0
1       NaN
2       NaN
3       NaN
4    2976.0
Name: ID, dtype: float64

Or you can build a custom function

def numeric(val):
    try:
        float(val)     # Using just `float` would suffice as int can be 
        return 'YES'   # converted to `float` so both `int`
                       # and `float` wouldnot raise any error
    except ValueError:
        return 'NO'

df['Result'] = df['ID'].apply(numeric)

Note: float handles scientic notation too, float("1e6") -> 1000000.0.

test = pd.Series(['1e6', '1', 'a 10', '1E6'])
test.apply(numeric)

0    YES
1    YES
2     NO
3    YES
dtype: object

wwnde · Accepted Answer · 2020-10-27 04:59:34Z

3

Check if ID contains non-digitsand reverse the Boolean selection using ~. Using np.where, allocate option

df['Result']=np.where(~df.ID.str.contains('(\D+)'),'Yes','N0')

     ID Result
0    3965    Yes
1    wyq8     N0
2  RO_123     N0
3    CMD_     N0
4    2976    Yes

As noted by @Cameron Riddell. You could also skip inverting the boolean and do the following;

df['Result']=np.where(df.ID.str.contains('(\D+)'),'No','Yes')

edited Oct 27, 2020 at 4:59

answered Oct 27, 2020 at 4:45

wwnde

26.7k6 gold badges22 silver badges38 bronze badges

3 Comments

Cameron Riddell Over a year ago

Instead of inverting the boolean array, why not just make "No" the True value and "Yes" the False value? np.where(df.ID.str.contains('(\D+)'),'No','Yes')

wwnde Over a year ago

@Cameron Riddell we actually could. My philosophy is to make answer as retraceable as possible for the OP if they choose to run break it down. Good point though.

Ch3steR Over a year ago

This would fail when there's a string '123 SO', using re.fullmatch would counter that example.

Carmoreno · Accepted Answer · 2020-10-27 22:46:25Z

1

You can use .isnumeric() method:

df3["Result"] = df3["ID"].str.isnumeric().apply(lambda x: "No" if x == False else "Yes")

[UPDATE]: This method works only with integers numbers, please view the @Ch3steR answer for other cases.

edited Oct 27, 2020 at 22:46

answered Oct 27, 2020 at 5:02

Carmoreno

1,32918 silver badges34 bronze badges

Collectives™ on Stack Overflow

Identifying only numeric values from a column in a Data Frame- Python

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related