Replace and remove text in a column

Question

I would like to:

delete the words "ANOS" and "ANO";
replace "A" to "TO"; and
replace "<1ano" to "0 to 1".

Example: "10 A 19 ANOS" to "10 to 19"

data = pd.DataFrame({'FAIXA_ETARIA': ['10 A 19 ANOS',' 20 A 29 ANOS', '30 A 39 ANOS', '40 A 49 ANOS',
                                    '50 A 59 ANOS', ' 60 A 69 ANOS', '70 A 79 ANOS', '80 A 89 ANOS',
                                      '<1ANO'],
                     'Count': [3, 8, 28, 7, 15, 9, 3, 5, 3]})

PS: My database presents many columns, I would like this procedure to be performed only in column "FAIXA_ETARIA"

Thanks for your help!

Does my answer make sense? I'm using regular expressions for flexibility. If you'd like, check out https://regex101.com/ to practice. I think there's a Python implementation available (on the left side of the window under "FLAVOR"). — Mark M
– Mark M, Commented Dec 29, 2020 at 21:09
@MarkMoretto your code works! Thanks for yout help and the link, I need to pratice this — kvratto
– kvratto, Commented Dec 29, 2020 at 22:30

Mark M · Accepted Answer · 2020-12-29 19:50:52Z

1

here's one possible way:

data["FAIXA_ETARIA"] \
    .str.replace(r"ANO\w?", "") \ # Regex for ANO plus an optional single character
    .str.replace(r"A", "TO") \ # Replace a single character
    .str.replace(r"<\w?", "0 to 1") # Regex for < and non-greedy multiple characters.

Output:

0     10 TO 19 
1     20 TO 29 
2     30 TO 39 
3     40 TO 49 
4     50 TO 59 
5     60 TO 69 
6     70 TO 79 
7     80 TO 89 
8        0 to 1
Name: FAIXA_ETARIA, dtype: object

answered Dec 29, 2020 at 19:50

Mark M

2,3432 gold badges19 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Linus Dehner · Accepted Answer · 2020-12-29 19:47:42Z

1

You could iterate threw the entrys in your array and then use the replace() method of python. Example:

message = "Hello there"
custom = message.replace("there", "kvratto")

The result would be "Hello kvratto".

In your case you've got a dictionary. So you can get the specific entrys with dictionaryname['columnname']. You can put the result in a new variable and then you can handle it like an array.

I hope that helps enough!

answered Dec 29, 2020 at 19:47

Linus Dehner

192 bronze badges

1 Comment

kvratto Over a year ago

Thanks for your help, @MarkMoretto and @LinusDehner!

Ferris · Accepted Answer · 2020-12-30 06:24:54Z

1

Or, just extract the digit, and join with ' to ':

data['FAIXA_ETARIA'] = data['FAIXA_ETARIA'].str.findall('\d+').str.join(' to ')
cond = data['FAIXA_ETARIA']  == '1'
data.loc[cond, 'FAIXA_ETARIA'] = '0 to 1'

0    10 to 19
1    20 to 29
2    30 to 39
3    40 to 49
4    50 to 59
5    60 to 69
6    70 to 79
7    80 to 89
8      0 to 1
Name: FAIXA_ETARIA, dtype: object

answered Dec 30, 2020 at 6:24

Ferris

5,6611 gold badge18 silver badges27 bronze badges

Collectives™ on Stack Overflow

Replace and remove text in a column

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related