Replace multiple strings in place that match

Question

I want to replace multiple strings in my list of dataframes that match. I cannot get these to match and replace in place, instead it produces additional row entries.

Here's the example data:

import pandas as pd
import re
from scipy import linalg

nm=['sr', 'pop15', 'pop75', 'dpi', 'ddpi']
df_tbl=pd.DataFrame(linalg.circulant(nm))

ls_comb = [df_tbl.loc[0:i] for i in range(0, len(df_tbl))]

extract_text=['dpi', 'pop15'] 
clean_text=['np.log(dpi)', 'np.log(pop15)']
cl_text=[re.search('(?<=\\()[^\\^\\)]+', i).group(0) for i in clean_text]
int_text=list(set(extract_text).intersection(cl_text))

I know that int_text is the same as extract_text, but in some instances I may only have one np.log for clean_text, so I just left this as is as I would be using int_text to filter.

And what I have tried:

[
    i.apply(
        lambda x: [
            re.sub(rf"\b{ext_t}\b", cln_t, val)
            for val in x
            for ext_t, cln_t in zip(int_text, clean_text)
        ]
    )
    for i in ls_comb
]

It produces the following:

[    0     1            2      3              4
 0  sr  ddpi  np.log(dpi)  pop75          pop15
 1  sr  ddpi          dpi  pop75  np.log(pop15),
                0     1            2            3              4
 0             sr  ddpi  np.log(dpi)        pop75          pop15
 1             sr  ddpi          dpi        pop75  np.log(pop15)
 2          pop15    sr         ddpi  np.log(dpi)          pop75
 3  np.log(pop15)    sr         ddpi          dpi          pop75,
                0              1            2            3              4
 0             sr           ddpi  np.log(dpi)        pop75          pop15
 1             sr           ddpi          dpi        pop75  np.log(pop15)
 2          pop15             sr         ddpi  np.log(dpi)          pop75
 3  np.log(pop15)             sr         ddpi          dpi          pop75
 4          pop75          pop15           sr         ddpi    np.log(dpi)
 5          pop75  np.log(pop15)           sr         ddpi            dpi,
.
.
.

However, it produces additional rows, I expect a clean solution like this:

[       0            1            2            3            4
 0     sr          ddpi       np.log(dpi)    pop75      np.log(pop15),
        0            1            2            3            4
 0     sr          ddpi       np.log(dpi)     pop75     np.log(pop15)
 1  np.log(pop15)   sr          ddpi       np.log(dpi)     pop75,
.
.
.

I'm afraid I don't really understand your objective. Could you perhaps give a more explicit example of the data you're working with, the output you expect, and an explanation of the logic you're applying? — CrazyChucky
– CrazyChucky, Commented Jun 30, 2022 at 21:45
@CrazyChucky I have updated with the output to compare with the expected output. Essentially, I want to replace values from int_text for those that match with their log form from clean_text. I wanted to replace these in place, however my attempt would loop within x so it would do a loop once for np.log(pop15), and a loop again for the other element so It would double the size. The expected output shows the values being replaced as they are in their place. — joe_bill.dollar
– joe_bill.dollar, Commented Jun 30, 2022 at 21:58
Looping is pretty much never the best answer when it comes to pandas... — BeRT2me
– BeRT2me, Commented Jun 30, 2022 at 22:24

BeRT2me · Accepted Answer · 2022-06-30 22:20:53Z

2

import pandas as pd
from scipy import linalg

nm=['sr', 'pop15', 'pop75', 'dpi', 'ddpi']
df_tbl=pd.DataFrame(linalg.circulant(nm))

extract_text=['dpi', 'pop15'] 
clean_text=['np.log(dpi)', 'np.log(pop15)']
df_tbl.replace(extract_text, clean_text, inplace=True)
print(df_tbl)

Output:

               0              1              2              3              4
0             sr           ddpi    np.log(dpi)          pop75  np.log(pop15)
1  np.log(pop15)             sr           ddpi    np.log(dpi)          pop75
2          pop75  np.log(pop15)             sr           ddpi    np.log(dpi)
3    np.log(dpi)          pop75  np.log(pop15)             sr           ddpi
4           ddpi    np.log(dpi)          pop75  np.log(pop15)             sr

answered Jun 30, 2022 at 22:20

BeRT2me

13.3k2 gold badges18 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

joe_bill.dollar Over a year ago

I did not expect this to work! I had used replace sometime before but it would replace ddpi and dpi because they had the same word, which is why I went for re.sub, does inplace=True prevent this issue?

BeRT2me Over a year ago

pd.DataFrame.replace is quite different from str.replace or even pd.Series.str.replace. It's important to keep track of which one you're using.

BeRT2me Over a year ago

Adding inplace=True is just a different way of doing df_tbl = df_tbl.replace(extract_text, clean_text) that's available for certain functions.

joe_bill.dollar Over a year ago

Ah I get it, this is definitely a much better option. I tried to do the replace on ls_comb which gave all the consistencies regarding errors.

Collectives™ on Stack Overflow

Replace multiple strings in place that match

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related