2

Suppose there is a dataframe defined as

df = pd.DataFrame({'Col_1': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', '0'], 
                   'Col_2': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', '0']})

which looks like

   Col_1 Col_2
0      A     a
1      B     b
2      C     c
3      D     d
4      E     e
5      F     f
6      G     g
7      H     h
8      I     i
9      J     j
10     0     0

I would like to replace the values in Col_1 by using a dictionary defined as

repl_dict = {re.compile('[ABH-LP-Z]'): 'DDD',
             re.compile('[CDEFG]'): 'BBB WTT',
             re.compile('[MNO]'): 'AAA WTT',
             re.compile('[0-9]'): 'CCC'}

I would expect to get a new dataframe in which the Col_1 should have been as follows

      Col_1
0       DDD
1       DDD
2   BBB WTT
3   BBB WTT
4   BBB WTT
5   BBB WTT
6   BBB WTT
7       DDD
8       DDD
9       DDD
10      CCC

I just simply use df['Col_1'].replace(repl_dict, regex=True). However, it does not produce what I expected. What I've got is like:

                      Col_1
0     BBB WTTBBB WTTBBB WTT
1     BBB WTTBBB WTTBBB WTT
2                   BBB WTT
3                   BBB WTT
4                   BBB WTT
5                   BBB WTT
6                   BBB WTT
7     BBB WTTBBB WTTBBB WTT
8     BBB WTTBBB WTTBBB WTT
9     BBB WTTBBB WTTBBB WTT
10                      CCC

I would appreciate it very much if anyone could let me know why the df.replace() was not working for me and what would be a correct way to replace multiple values to get the expected output.

2 Answers 2

4

Use anchors (^ and $, that is):

repl_dict = {re.compile('^[ABH-LP-Z]$'): 'DDD',
             re.compile('^[CDEFG]$'): 'BBB WTT',
             re.compile('^[MNO]$'): 'AAA WTT',
             re.compile('^[0-9]+$'): 'CCC'}

Which produces with df['Col_1'].replace(repl_dict, regex=True):

0         DDD
1         DDD
2     BBB WTT
3     BBB WTT
4     BBB WTT
5     BBB WTT
6     BBB WTT
7         DDD
8         DDD
9         DDD
10        CCC
Sign up to request clarification or add additional context in comments.

2 Comments

hey, can I replace the whole value if it matches a particular pattern, for ex:rawData['srce'] = np.where(rawData['srce'].str.contains('WEB'),'WEB', np.where(rawData['srce'].str.contains('GLOBAL MERCHANT|GMS'),'GMS', rawData['srce']))
@siddheshtiwari: Just try it out I guess. Otherwise pose a question.
0

A more realistic scenario could be where you would want reclassify entries based on a pattern as follows:

Consider dataframe 'x' as follows:

             column
0       good farmer
1        bad farmer
2         ok farmer
3  worker did wrong
4      worker fired
5      worker hired
6   heavy duty work
7   light duty work

Then consider the following code:

x['column_reclassified'] = x['column'].replace(
    to_replace=[
        '^.*(farmer).*$',
        '^.*(worker).*$',
        '^.*(duty).*$'
    ],
    value=[
        'FARMER',
        'WORKER',
        'DUTY'
    ],
    regex=True
)

and it will produce the following output:

             column column_reclassified
0       good farmer              FARMER
1        bad farmer              FARMER
2         ok farmer              FARMER
3  worker did wrong              WORKER
4      worker fired              WORKER
5      worker hired              WORKER
6   heavy duty work                DUTY
7   light duty work                DUTY

Hope this also helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.