Removing stand-alone numbers from string in Python

Question

There are a lot of similar questions, but I have not found a solution for my problem. I have a data frame with the following structure/form:

   col_1
0  BULKA TARTA 500G KAJO 1
1  CUKIER KRYSZTAL 1KG KSC 4
2  KASZA JĘCZMIENNA 4*100G 2 0.92
3  LEWIATAN MAKARON WSTĄŻKA 1 0.89

However, I want to achieve the effect:

   col_1
0  BULKA TARTA 500G KAJO
1  CUKIER KRYSZTAL 1KG KSC
2  KASZA JĘCZMIENNA 4*100G
3  LEWIATAN MAKARON WSTĄŻKA

So I want to remove the independent natural and decimal numbers, but leave the numbers in the string with the letters.

I tried to use df.col_1.str.isdigit().replace([True, False],[np.nan, df.col_1]) , but it only works on comparing the entire cell whether it is a number or not.

You have some ideas how to do it? Or maybe it would be good to break the column with spaces and then compare?

Updated my answer to also include a regex example. Hope it helped! — alexisdevarennes
– alexisdevarennes, Commented Nov 13, 2017 at 17:32

Anton vBR · Accepted Answer · 2017-11-13 17:47:06Z

1

We could create a function that tries to convert to float. If it fails we return True (not_float)

import pandas as pd

df = pd.DataFrame({"col_1" : ["BULKA TARTA 500G KAJO 1",
                              "CUKIER KRYSZTAL 1KG KSC 4",
                              "KASZA JĘCZMIENNA 4*100G 2 0.92",
                              "LEWIATAN MAKARON WSTĄŻKA 1 0.89"]})

def is_not_float(string):
    try:
        float(string)
        return False
    except ValueError:  # String is not a number
        return True

df["col_1"] = df["col_1"].apply(lambda x: [i for i in x.split(" ") if is_not_float(i)])

df

Or following the example of my fellow SO:ers. However this would treat 130. as a number.

df["col_1"] = (df["col_1"].apply(
    lambda x: [i for i in x.split(" ") if not i.replace(".","").isnumeric()]))

Returns

                          col_1
0    [BULKA, TARTA, 500G, KAJO]
1  [CUKIER, KRYSZTAL, 1KG, KSC]
2   [KASZA, JĘCZMIENNA, 4*100G]
3  [LEWIATAN, MAKARON, WSTĄŻKA]

edited Nov 13, 2017 at 17:47

answered Nov 13, 2017 at 17:35

Anton vBR

19k6 gold badges47 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Tomasz Przemski Over a year ago

This is my favorite solution. And how to use something like ' '.join(df2[i]) after each line, so that everything is connected again?

Anton vBR Over a year ago

@TomaszPrzemski Sorry is that a question? What is df2 in this case?

Tomasz Przemski Over a year ago

Yes:) I introduced a new variable 'df2' instead df["col_1"], and later

h=[] for i in range(len(df2)):     h.append(' '.join(df2[i]))     g = "\n".join(h)      with open("C:\\Users\dell\\Desktop\\delikatesy\\wyczyszczone\\delikatesy_test1.csv", 'w', encoding='utf-8') as of:         of.write(g)

. I'm just learning, but I think it's easier to do it :)

alexisdevarennes · Accepted Answer · 2017-11-13 17:31:26Z

1

Sure,

You could use a regex.

import re
df.col_1 = re.sub("\d+\.?\d+?", "",  df.col_1)

edited Nov 13, 2017 at 17:31

answered Nov 13, 2017 at 17:23

alexisdevarennes

5,6604 gold badges28 silver badges39 bronze badges

Comments

Johny Vaknin · Accepted Answer · 2017-11-13 17:42:06Z

0

Yes you can

def no_nums(col):
    return ' '.join(filter(lambda word:word.replace('.','').isdigit()==False, col.split()))
df.col_1.apply(no_nums)

This filters out words from each value which are completely made of digits,
And maybe contains a decimal point.
If you want to filter out numbers like 1,000, simply add another replace for ','

edited Nov 13, 2017 at 17:42

answered Nov 13, 2017 at 17:36

Johny Vaknin

2872 gold badges3 silver badges12 bronze badges

Collectives™ on Stack Overflow

Removing stand-alone numbers from string in Python

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related