Pandas make new columns from sub-string of another column

Question

I'm trying to create new columns in pandas from sub-strings of another column.

import pandas as pd
import re

df = {'title':['Apartment 2 roomns, 40 m²', 'House 7 rooms, 183 m²', 'House 4 rooms, 93 m²', 'Apartment 12 rooms, 275 m²']}

I'm trying with regex to capture groups:

df['Name'] = df.title.str.extract(r'(^[a-zA-Z]+)', expand=True)

This one I got a good result. But I need a column with the number of rooms (without the word "rooms") and another column with the size without "m²". I tried:

df['Rooms'] = df.title.str.replace(r'(^[0-9]+)\s(rooms)', r'\1') #to capture only the first group, which is the number

df['Size'] = df.title.str.replace(r'(^[0-9]+)\s(m²)', r'\1') #to capture only the first group, which is the number

My output:

   Name      Rooms                         Size
0  Apartment Apartment 2 roomns, 40 m²     Apartment 2 roomns, 40 m²
1  House     House 7 rooms, 183 m²         House 7 rooms, 183 m²
2  House     House 4 rooms, 93 m²          House 4 rooms, 93 m²
3  Apartment Apartment 12 rooms, 275 m²    Apartment 12 rooms, 275 m²

Good output:

   Name      Rooms Size
0  Apartment 2     40
1  House     7     183
2  House     4     93
3  Apartment 12    275

I see a typo in the word roomns, so, I guess you need to account for this typo. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jul 6, 2022 at 10:22

Wiktor Stribiżew · Accepted Answer · 2022-07-06 10:22:10Z

1

You can use

df["Rooms"] = df["title"].str.extract(r'(\d+)\s*room', expand=False)
df['Size'] = df["title"].str.extract(r'(\d+(?:\.\d+)?)\s*m²', expand=False)

Output:

>>> df
                        title Rooms Size
0   Apartment 2 roomns, 40 m²     2   40
1       House 7 rooms, 183 m²     7  183
2        House 4 rooms, 93 m²     4   93
3  Apartment 12 rooms, 275 m²    12  275

The (\d+)\s*room regex matches and captures into Group 1 one or more digits, and then just matches zero or more whitespaces (\s*) and then a room string.

The (\d+(?:\.\d+)?)\s*m² regex matches and captured one or more digits, and an optional string of a . and one or more digits, and then matches zero or more whitespaces and then a m² string.

See regex #1 demo and the regex #2 demo.

answered Jul 6, 2022 at 10:22

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas make new columns from sub-string of another column

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related