how to remove specific str from dataframe in python?

Question

I am working on a dataframe which contains "_" on each row, eg:

    numbers
0   123
1   321_2
2   2222_2
3   41232_1
4   23123_5
5   45455
6   231231
7   3479_23_23
8   82837_212_fd

My purpose is to remove all the string after first '_' for each row, eg:

then I got an idea using 'split' function:

result = s.split("_")[0]

However, it cannot apply to the dataframe since I got an error: AttributeError: 'DataFrame' object has no attribute 'split'

My first question is that: How can I remove str after first '_'?

Moreover, is it possible to just remove '_' but keep the leading number part?

try s.str.split('_')[0] or s.numbers.str.split('_')[0]. See here for the string methods — Haleemur Ali
– Haleemur Ali, Commented Oct 10, 2020 at 2:24
@Haleemur Ali thx for your help, it only works for singe rwos, but I got dataframe — 史超南
– 史超南, Commented Oct 10, 2020 at 2:50

BENY · Accepted Answer · 2020-10-10 02:41:43Z

1

You can do

df['numbers'] = df['numbers'].astype(str).str.split('_').str[0]
df
  numbers
0     123
1     321
2    2222
3   41232
4   23123
5   45455
6  231231
7    3479
8   82837

edited Oct 10, 2020 at 2:41

answered Oct 10, 2020 at 2:28

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

史超南 Over a year ago

this is exactly what I have done previously, the only problem of this code is that it removes all rows if does not contains '' , for exampel: the first row '123' will turn out to be 'nan' insead, as well as for other rows without ''

BENY Over a year ago

@史超南 adding astype str

BENY Over a year ago

@史超南 do you have dataframe or series ?

skchandra · Accepted Answer · 2020-10-10 03:33:16Z

Adding to BEN_YO's answer.

If it is a series, you can use split function on it.

lst = ['123','321_2','2222_2','41232_1','23123_5','45455','231231','3479_23_23','82837_212_fd']

s = pd.Series(lst)

s

0             123
1           321_2
2          2222_2
3         41232_1
4         23123_5
5           45455
6          231231
7      3479_23_23
8    82837_212_fd
dtype: object

s.str.split('_').str[0]

0       123
1       321
2      2222
3     41232
4     23123
5     45455
6    231231
7      3479
8     82837
dtype: object

However, if it is a dataframe, then replace the column value by using the same method.

df['numbers'] # returns a Series and we are applying split function on that series.

df = pd.Series(lst).to_frame('numbers')
type(df['numbers'])

pandas.core.series.Series

df['numbers'] = df['numbers'].str.split('_').str[0]
print(df)

Collectives™ on Stack Overflow

how to remove specific str from dataframe in python?

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related