1

This is my pandas data-frame. I want to modify values of first column (System) by extracting just a,b,c,d. How can this be done in python

 System       mem
/vol/a/        10   
/vol/b/        20   
/vol/c/        30   
/vol/d/        40
3
  • Are they always the last characters before /? Commented Nov 14, 2017 at 4:37
  • yeah.. there can be strings also in place of a,b,c,d Commented Nov 14, 2017 at 4:39
  • as in multiple character? say 'abc'? Commented Nov 14, 2017 at 4:41

4 Answers 4

4

You can use .str.extract:

In [11]: df.System.str.extract("/vol/(.*?)/", expand=False)
Out[11]:
0    a
1    b
2    c
3    d
Name: System, dtype: object
Sign up to request clarification or add additional context in comments.

4 Comments

what if there is also sequences like /vol/abc/.aggr and we have to extract abc.
@madhuri make the regex pattern "/vol/(.*?)/.*"
Now I think about it the ? is not required here.
@AndyHayden: It is for strings like /vol/a/.hidden/.git, see OP's comment above. A bit unclear what he really wants.
4

Can be done in multiple ways, here is one

df['System'] = df['System'].str.split('/').str[-2]

    System  mem
0   a       10
1   b       20
2   c       30
3   d       40

Option 2:

df['System'] = df.System.str.replace('[/vol/|/]', '')

Andy Hayden already covered str.extract

3 Comments

what if there is also sequences like /vol/abc/.aggr and we have to extract abc
str.extract would work best in that case, df['System'].str.extract('/vol/([A-Za-z]+)/', expand = False)
@madhuri, consider upvoting the answers and accepting the most helpful answer. It helps in closing the question and works as an appreciation for people who take time out to answer
3

Using str.extract:

import pandas as pd

df = pd.DataFrame({'System': ['/vol/a/', '/vol/b/', '/vol/c/', '/vol/d/'], 'mem': [10, 20, 30, 40]})

df['new_column'] = df['System'].str.extract(r'([^/]+)/?$')
print(df)

This yields

    System  mem new_column
0  /vol/a/   10          a
1  /vol/b/   20          b
2  /vol/c/   30          c
3  /vol/d/   40          d

1 Comment

what if there is also sequences like /vol/abc/.aggr and we have to extract abc
1

By using rsplit you can have as much stuff in front as you'd like.

df.assign(New=df.System.str.rsplit('/', 2).str[-2])

    System  mem New
0  /vol/a/   10   a
1  /vol/b/   20   b
2  /vol/c/   30   c
3  /vol/d/   40   d

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.