I have a dataframe in which one column represents some data, the other column represents indices on which I want to delete from my data. So starting from this:
import pandas as pd
import numpy as np
df = pd.DataFrame({'data':[np.arange(1,5),np.arange(3)],'to_delete': [np.array([2]),np.array([0,2])]})
df
>>>> data to_delete
[1,2,3,4] [2]
[0,1,2] [0,2]
This is what I want to end up with:
new_df
>>>> data to_delete
[1,2,4] [2]
[1] [0,2]
I could iterate over the rows by hand and calculate the new data for each one like this:
new_data = []
for _,v in df.iterrows():
foo = np.delete(v['data'],v['to_delete'])
new_data.append(foo)
df.assign(data=new_data)
but I'm looking for a better way to do this.
iterrows()is rather sluggish. Why not useapply()?