0

I would like to:

  1. delete the words "ANOS" and "ANO";
  2. replace "A" to "TO"; and
  3. replace "<1ano" to "0 to 1".

Example: "10 A 19 ANOS" to "10 to 19"

data = pd.DataFrame({'FAIXA_ETARIA': ['10 A 19 ANOS',' 20 A 29 ANOS', '30 A 39 ANOS', '40 A 49 ANOS',
                                    '50 A 59 ANOS', ' 60 A 69 ANOS', '70 A 79 ANOS', '80 A 89 ANOS',
                                      '<1ANO'],
                     'Count': [3, 8, 28, 7, 15, 9, 3, 5, 3]})

PS: My database presents many columns, I would like this procedure to be performed only in column "FAIXA_ETARIA"

Thanks for your help!

2
  • Does my answer make sense? I'm using regular expressions for flexibility. If you'd like, check out https://regex101.com/ to practice. I think there's a Python implementation available (on the left side of the window under "FLAVOR"). Commented Dec 29, 2020 at 21:09
  • @MarkMoretto your code works! Thanks for yout help and the link, I need to pratice this Commented Dec 29, 2020 at 22:30

3 Answers 3

1

here's one possible way:

data["FAIXA_ETARIA"] \
    .str.replace(r"ANO\w?", "") \ # Regex for ANO plus an optional single character
    .str.replace(r"A", "TO") \ # Replace a single character
    .str.replace(r"<\w?", "0 to 1") # Regex for < and non-greedy multiple characters.

Output:

0     10 TO 19 
1     20 TO 29 
2     30 TO 39 
3     40 TO 49 
4     50 TO 59 
5     60 TO 69 
6     70 TO 79 
7     80 TO 89 
8        0 to 1
Name: FAIXA_ETARIA, dtype: object
Sign up to request clarification or add additional context in comments.

Comments

1

You could iterate threw the entrys in your array and then use the replace() method of python. Example:

message = "Hello there"
custom = message.replace("there", "kvratto")

The result would be "Hello kvratto".

In your case you've got a dictionary. So you can get the specific entrys with dictionaryname['columnname']. You can put the result in a new variable and then you can handle it like an array.

I hope that helps enough!

1 Comment

Thanks for your help, @MarkMoretto and @LinusDehner!
1

Or, just extract the digit, and join with ' to ':

data['FAIXA_ETARIA'] = data['FAIXA_ETARIA'].str.findall('\d+').str.join(' to ')
cond = data['FAIXA_ETARIA']  == '1'
data.loc[cond, 'FAIXA_ETARIA'] = '0 to 1'
0    10 to 19
1    20 to 29
2    30 to 39
3    40 to 49
4    50 to 59
5    60 to 69
6    70 to 79
7    80 to 89
8      0 to 1
Name: FAIXA_ETARIA, dtype: object

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.