1

I am reading a CSV file to Pandas DataFrame but need to be cleaned up before can be used. I need to do two things:

  1. use regex to filter values

  2. apply string functions such as trim, left, right, ...

For instance, DataFrame may looks like:

0 city_some_string_45
1 city_Other_string_56
2 city_another_string_77

so I need to filter (using regex) for all rows that its value start with "city" and get last two character.

the end result should looks like:

0 45 
1 56 
2 77

In another word, logic I want to apply is: read value of cell and if starts with city (filtering with regex ie: ^city) and replace the value of cell with its two last character of the cell (eg using right string function)

1 Answer 1

1

For a dataframe like this:

    No  city
0   0   city_some_string_45
1   1   city_Other_string_56
2   2   city_another_string_77

Filter the dataframe to keep the rows with city column starting with city

df = df[df.city.str.startswith('city')]

You can use str.extract to extract only the number

df['city'] = df.city.str.extract('(\d+)').astype(int)

The resulting df

    No  city
0   0   45
1   1   56
2   2   77
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks - logic I want to apply is more like: read value of cell and if starts with city replace the value of cell with its two last character of the cell
Thanks - I used your suggestion combined with str.slice() function and it did work for this instance but I need a better way to handle this so I am thinking to write a little function and use apply(). Thanks again for your tip

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.