replacing string with a different string in pandas depending on value

Question

I am practicing pandas and I have an exercise with which I have a problem

I have an excel file that has a column where two types of urls are stored.

df = pd.DataFrame({'id': [], 
                   'url': ['www.something/12312', 'www.something/12343', 'www.somethingelse/42312', 'www.somethingelse/62343']})

   | id | url |
    | -------- | -------------- |
    |     | 'www.something/12312'  |
    |   | 'www.something/12343'    |
    |     | 'www.somethingelse/42312'    | 
    |    | 'www.somethingelse/62343'    |

I am supposed to transform this into ids, but only number should be part of the id, the new id column should look like this:

df = pd.DataFrame({'id': [id_12312 , id_12343, diffid_42312, diffid_62343], 'url': ['www.something/12312', 'www.something/12343', 'www.somethingelse/42312', 'www.somethingelse/62343']})

| id | url |
| -------- | -------------- |
| id_12312    | 'www.something/12312'  |
| id_12343    | 'www.something/12343'    |
| diffid_42312    | 'www.somethingelse/42312'    | 
| diffid_62343    | 'www.somethingelse/62343'    |

My problem is how to get only numbers and replace them if that kind of id? I have tried the replace and extract function in pandas

id_replaced = df.replace(regex={re.search('something', df['url']): 'id_' + str(re.search(r'\d+', i).group()), re.search('somethingelse', df['url']): 'diffid_' + str(re.search(r'\d+', i).group())})
        
df['id'] = df['url'].str.extract(re.search(r'\d+', df['url']).group())

However, they are throwing an error TypeError: expected string or bytes-like object.

Sorry for the tables in codeblock. The page was screaming that I have code that is not properly formatted when it was not in a codeblock.

Please format your examples so they are reproducible: stackoverflow.com/questions/20109391/… — nocibambi
– nocibambi, Commented May 27, 2021 at 9:49
what exactly is diffid? When do you use id as prefix and when to use diffid? — Danail Petrov
– Danail Petrov, Commented May 27, 2021 at 11:43

Danail Petrov · Accepted Answer · 2021-05-27 11:53:32Z

3

Here is one solution, but I don't quite understand when do you use the id prefix and when to use diffid ..

>>> df.id = 'id_'+df.url.str.split('/', n=1, expand=True)[1]
>>> df
         id                      url
0  id_12312      www.something/12312
1  id_12343      www.something/12343
2  id_42312  www.somethingelse/42312
3  id_62343  www.somethingelse/62343

Or using str.extract

>>> df.id = 'id_' + df.url.str.extract(r'/(\d+)$')

edited May 27, 2021 at 11:53

answered May 27, 2021 at 11:47

Danail Petrov

1,90512 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Paulina Over a year ago

Thank you. The prefix is supposed to be different for a different web page, so when I have a webpage somethingelse the prefix is diffid_, but when I have webpage something the prefix is id_

Paulina Over a year ago

Thank I managed to solve it for prefix too thanks to your help :) df['id_num'] = df.url.str.extract(r'/(\d+)$').astype(str) df['id_prefix'] = np.where((df['url'].str.contains('somethingelse')), 'diffid_', 'id_') df['id'] = df['id_prefix'] + df['id_num']

Collectives™ on Stack Overflow

replacing string with a different string in pandas depending on value

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related