iterate over pandas dataframe, update value from data in another row, and delete that other row

Question

I have a pandas dataframe of 7000 rows, below is a sample

DataFrame

I need to fill in the missing branch type column, the missing info is available in the rows below. For the first row, I search the data frame ['link_name'] for B-A. and use the root_type to be the branch name.

After the extraction I want to delete the row I extracted the root_type from to have an output like this:

expected output

I tried the below code, but it doesn't work properly

count = 0
missing = 0
errored_links=[]
for i,j in bmx.iterrows():
    try:
        spn = bmx[bmx.link_name ==j.link_reverse_name].root_type.values[0]
        index_t =  bmx[bmx.link_name ==j.link_reverse_name].root_type.index[0]
        bmx.drop(bmx.index[index_t],inplace=True)
        count+=1
        bmx.at[i,'branch_type']=spn
    except:
        bmx.at[i,'branch_type']='missing'
        missing+=1
        errored_links.append(j)

print('Iterations: ',count)
print('Missing: ', missing)

Just to mention, you have "root_product" in screens which you refer as "root_type" in code. — Ruli
– Ruli, Commented Feb 2, 2021 at 22:42

Ruli · Accepted Answer · 2021-02-02 22:38:51Z

1

Build up a list with indices to be removed, do the job and after iterating all rows remove the unneeded rows. Do not use if/else in loop, simply set all to be missing by start and then set those that have branch type to its values.

bmx=pd.DataFrame({'link_name':["A-B","C-D","B-A","D-C"],
              'root_type':["type1", "type2", "type6", "type1"],
              'branch_type':["","","",""],
             'link_reverse_name':["B-A","D-C","A-B","C-D"]},
                 columns=['link_name','root_type','branch_type','link_reverse_name'])
    
bmx["branch_type"]="missing" #set all to be missing by start, get rid of ifs :)

to_remove = []

for i,j in bmx.iterrows():
    if(i in to_remove):
        continue #just skip if we marked the row for removal already
    link = bmx[bmx.link_name == j.link_reverse_name].root_type.values[0]
    idx = bmx[bmx.link_name == j.link_reverse_name].index
    if link:
        j.branch_type = link
        to_remove.append(idx[0]) #append the index to the list
        
bmx.drop(to_remove, inplace=True)
print(bmx)

We get the desired output:

  link_name root_type branch_type link_reverse_name
0       A-B     type1       type6               B-A
1       C-D     type2       type1               D-C

Of course I expect that all entries are unique, otherwise this will produce some duplicates. I did not use the not problem relevant cols for simplicity.

answered Feb 2, 2021 at 22:38

Ruli

2,83113 gold badges36 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mnasrallah Over a year ago

Thank you, that worked just fine. just had some errors as the link_name sometimes is missing, but i think i will just put that in a try statement

Collectives™ on Stack Overflow

iterate over pandas dataframe, update value from data in another row, and delete that other row

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related