0

I am working on a dataframe which contains "_" on each row, eg:

    numbers
0   123
1   321_2
2   2222_2
3   41232_1
4   23123_5
5   45455
6   231231
7   3479_23_23
8   82837_212_fd

My purpose is to remove all the string after first '_' for each row, eg:

    numbers
0   123
1   321
2   2222
3   41232
4   23123
5   45455
6   231231
7   3479
8   82837

then I got an idea using 'split' function:

result = s.split("_")[0]

However, it cannot apply to the dataframe since I got an error: AttributeError: 'DataFrame' object has no attribute 'split'

My first question is that: How can I remove str after first '_'?

Moreover, is it possible to just remove '_' but keep the leading number part?

2
  • try s.str.split('_')[0] or s.numbers.str.split('_')[0]. See here for the string methods Commented Oct 10, 2020 at 2:24
  • @Haleemur Ali thx for your help, it only works for singe rwos, but I got dataframe Commented Oct 10, 2020 at 2:50

2 Answers 2

1

You can do

df['numbers'] = df['numbers'].astype(str).str.split('_').str[0]
df
  numbers
0     123
1     321
2    2222
3   41232
4   23123
5   45455
6  231231
7    3479
8   82837
Sign up to request clarification or add additional context in comments.

3 Comments

this is exactly what I have done previously, the only problem of this code is that it removes all rows if does not contains '' , for exampel: the first row '123' will turn out to be 'nan' insead, as well as for other rows without ''
@史超南 adding astype str
@史超南 do you have dataframe or series ?
0

Adding to BEN_YO's answer.

If it is a series, you can use split function on it.

lst = ['123','321_2','2222_2','41232_1','23123_5','45455','231231','3479_23_23','82837_212_fd']

s = pd.Series(lst)

s
0             123
1           321_2
2          2222_2
3         41232_1
4         23123_5
5           45455
6          231231
7      3479_23_23
8    82837_212_fd
dtype: object
s.str.split('_').str[0]
0       123
1       321
2      2222
3     41232
4     23123
5     45455
6    231231
7      3479
8     82837
dtype: object

However, if it is a dataframe, then replace the column value by using the same method.

df['numbers'] # returns a Series and we are applying split function on that series.

df = pd.Series(lst).to_frame('numbers')
type(df['numbers'])
pandas.core.series.Series
df['numbers'] = df['numbers'].str.split('_').str[0]
print(df)
  numbers
0     123
1     321
2    2222
3   41232
4   23123
5   45455
6  231231
7    3479
8   82837

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.