I want to replace multiple strings in my list of dataframes that match. I cannot get these to match and replace in place, instead it produces additional row entries.
Here's the example data:
import pandas as pd
import re
from scipy import linalg
nm=['sr', 'pop15', 'pop75', 'dpi', 'ddpi']
df_tbl=pd.DataFrame(linalg.circulant(nm))
ls_comb = [df_tbl.loc[0:i] for i in range(0, len(df_tbl))]
extract_text=['dpi', 'pop15']
clean_text=['np.log(dpi)', 'np.log(pop15)']
cl_text=[re.search('(?<=\\()[^\\^\\)]+', i).group(0) for i in clean_text]
int_text=list(set(extract_text).intersection(cl_text))
I know that int_text is the same as extract_text, but in some instances I may only have one np.log for clean_text, so I just left this as is as I would be using int_text to filter.
And what I have tried:
[
i.apply(
lambda x: [
re.sub(rf"\b{ext_t}\b", cln_t, val)
for val in x
for ext_t, cln_t in zip(int_text, clean_text)
]
)
for i in ls_comb
]
It produces the following:
[ 0 1 2 3 4
0 sr ddpi np.log(dpi) pop75 pop15
1 sr ddpi dpi pop75 np.log(pop15),
0 1 2 3 4
0 sr ddpi np.log(dpi) pop75 pop15
1 sr ddpi dpi pop75 np.log(pop15)
2 pop15 sr ddpi np.log(dpi) pop75
3 np.log(pop15) sr ddpi dpi pop75,
0 1 2 3 4
0 sr ddpi np.log(dpi) pop75 pop15
1 sr ddpi dpi pop75 np.log(pop15)
2 pop15 sr ddpi np.log(dpi) pop75
3 np.log(pop15) sr ddpi dpi pop75
4 pop75 pop15 sr ddpi np.log(dpi)
5 pop75 np.log(pop15) sr ddpi dpi,
.
.
.
However, it produces additional rows, I expect a clean solution like this:
[ 0 1 2 3 4
0 sr ddpi np.log(dpi) pop75 np.log(pop15),
0 1 2 3 4
0 sr ddpi np.log(dpi) pop75 np.log(pop15)
1 np.log(pop15) sr ddpi np.log(dpi) pop75,
.
.
.
int_textfor those that match with their log form fromclean_text. I wanted to replace these in place, however my attempt would loop withinxso it would do a loop once fornp.log(pop15), and a loop again for the other element so It would double the size. The expected output shows the values being replaced as they are in their place.