Pandas - Extract a phrase from one column and adding it to a new column

Question

I have some text data in (Column 1), and was wondering if I could extract a specific sequence from the rows in that column and add them to a new column.

For example:

  (column1)
Coke Can 300ml
Bottle 800ml
Cup
Bucket 2000ml

Turns into:

(column1)          (column2)
 Coke Can            300ml
 Bottle              800ml
 Cup                 N/A
 Bucket              20000ml

Basically, I want to extract every phrase with "xxml" and insert that into a new column. Thank you for the help!

I think this is pretty clear? My issue was that I wanted to find a way to extract the weight values and place them in their own column. — limesurfboard
– limesurfboard, Commented Mar 25, 2020 at 17:55
Related: stackoverflow.com/questions/33408403/…, stackoverflow.com/questions/44464118/…, stackoverflow.com/questions/52171736/…, stackoverflow.com/questions/54681095/…, stackoverflow.com/questions/54440554/… — AMC
– AMC, Commented Mar 25, 2020 at 17:57
My issue was that I wanted to find a way to extract the weight values and place them in their own column. That's what you're trying to do, not what the problem actually is. — AMC
– AMC, Commented Mar 25, 2020 at 17:59
Ok I'll keep that in mind for further questions. Thanks for the links. — limesurfboard
– limesurfboard, Commented Mar 25, 2020 at 18:14

sammywemmy · Accepted Answer · 2020-03-25 04:30:08Z

3

use pandas str extract to search for numbers followed by 'ml'

  df['(column2)'] = df.iloc[:,0].str.extract(r'(\d+ml)')

    (column1)      (column2)
0   Coke Can 300ml  300ml
1   Bottle 800ml    800ml
2   Cup             NaN
3   Bucket 2000ml   2000ml

If you want to remove the 'ml' after the digits, you can use regex look behind assertion ... it will only look for 'ml' after digits and replace it with an empty string

df.iloc[:,0] = df.iloc[:,0].str.replace('(?<=\d)ml','')

    (column1)   (column2)
0   Coke Can 300    300ml
1   Bottle 800      800ml
2   Cup             NaN
3   Bucket 2000     2000ml

edited Mar 25, 2020 at 4:30

answered Mar 25, 2020 at 2:57

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

limesurfboard Over a year ago

Thanks! This worked. Now, is there a way to remove the ml values in the original column?

sammywemmy Over a year ago

added a solution to cover removing 'ml' values after the digit

denis_smyslov · Accepted Answer · 2020-03-25 14:36:13Z

1

use pandas.extractall to extract into various columns.

import pandas as pd
df = pd.DataFrame(dict(
    col1 = ['Coke Can 300ml', 'Bottle 800ml', 'Cup', 'Bucket 2000ml']))
print(df.to_markdown())
|    | col1           |
|---:|:---------------|
|  0 | Coke Can 300ml |
|  1 | Bottle 800ml   |
|  2 | Cup            |
|  3 | Bucket 2000ml  |

import re
df=df['col1'].str.extractall('([a-z ]+)(\d+)?([a-z]+)?',flags=re.I)
print(df.to_markdown())

|        | 0        |    1 | 2   |
|:-------|:---------|-----:|:----|
| (0, 0) | Coke Can |  300 | ml  |
| (1, 0) | Bottle   |  800 | ml  |
| (2, 0) | Cup      |  nan | nan |
| (3, 0) | Bucket   | 2000 | ml  |

edited Mar 25, 2020 at 14:36

answered Mar 25, 2020 at 3:56

denis_smyslov

9079 silver badges8 bronze badges

Comments

dzakyputra · Accepted Answer · 2020-03-25 02:56:03Z

0

You might want to try this.

df['new_column'] = df['column'].apply(lambda x: x.split()[-1] if len(x.split()) > 1 else None)

answered Mar 25, 2020 at 2:56

dzakyputra

6824 silver badges16 bronze badges

Comments

baxx · Accepted Answer · 2020-03-25 02:59:46Z

0

Given

df = pd.DataFrame(dict(
    col1 = ['Coke Can 300ml', 'Bottle 800ml', 'Cup', 'Bucket 2000ml'])
)

the following might be what you're after here:

In [13]: df.col1.str.split(' ', expand=True, n = 1)
Out[13]:
        0          1
0    Coke  Can 300ml
1  Bottle      800ml
2     Cup       None
3  Bucket     2000ml

However, this is splitting on the first whitespace from the right of the column values.

For this the answer you have from @sammywemmy seems best, I'm simply putting this here as it might be of interest.

answered Mar 25, 2020 at 2:59

baxx

4,95414 gold badges57 silver badges129 bronze badges

Collectives™ on Stack Overflow

Pandas - Extract a phrase from one column and adding it to a new column

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related