2

I'm trying to create new columns in pandas from sub-strings of another column.

import pandas as pd
import re

df = {'title':['Apartment 2 roomns, 40 m²', 'House 7 rooms, 183 m²', 'House 4 rooms, 93 m²', 'Apartment 12 rooms, 275 m²']} 

I'm trying with regex to capture groups:

df['Name'] = df.title.str.extract(r'(^[a-zA-Z]+)', expand=True) 

This one I got a good result. But I need a column with the number of rooms (without the word "rooms") and another column with the size without "m²". I tried:

df['Rooms'] = df.title.str.replace(r'(^[0-9]+)\s(rooms)', r'\1') #to capture only the first group, which is the number

df['Size'] = df.title.str.replace(r'(^[0-9]+)\s(m²)', r'\1') #to capture only the first group, which is the number

My output:

   Name      Rooms                         Size
0  Apartment Apartment 2 roomns, 40 m²     Apartment 2 roomns, 40 m²
1  House     House 7 rooms, 183 m²         House 7 rooms, 183 m²
2  House     House 4 rooms, 93 m²          House 4 rooms, 93 m²
3  Apartment Apartment 12 rooms, 275 m²    Apartment 12 rooms, 275 m²

Good output:

   Name      Rooms Size
0  Apartment 2     40
1  House     7     183
2  House     4     93
3  Apartment 12    275
1
  • 1
    I see a typo in the word roomns, so, I guess you need to account for this typo. Commented Jul 6, 2022 at 10:22

1 Answer 1

1

You can use

df["Rooms"] = df["title"].str.extract(r'(\d+)\s*room', expand=False)
df['Size'] = df["title"].str.extract(r'(\d+(?:\.\d+)?)\s*m²', expand=False)

Output:

>>> df
                        title Rooms Size
0   Apartment 2 roomns, 40 m²     2   40
1       House 7 rooms, 183 m²     7  183
2        House 4 rooms, 93 m²     4   93
3  Apartment 12 rooms, 275 m²    12  275

The (\d+)\s*room regex matches and captures into Group 1 one or more digits, and then just matches zero or more whitespaces (\s*) and then a room string.

The (\d+(?:\.\d+)?)\s*m² regex matches and captured one or more digits, and an optional string of a . and one or more digits, and then matches zero or more whitespaces and then a string.

See regex #1 demo and the regex #2 demo.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.