1

I have a pandas dataframe of 7000 rows, below is a sample

DataFrame

I need to fill in the missing branch type column, the missing info is available in the rows below. For the first row, I search the data frame ['link_name'] for B-A. and use the root_type to be the branch name.

After the extraction I want to delete the row I extracted the root_type from to have an output like this:

expected output

I tried the below code, but it doesn't work properly

count = 0
missing = 0
errored_links=[]
for i,j in bmx.iterrows():
    try:
        spn = bmx[bmx.link_name ==j.link_reverse_name].root_type.values[0]
        index_t =  bmx[bmx.link_name ==j.link_reverse_name].root_type.index[0]
        bmx.drop(bmx.index[index_t],inplace=True)
        count+=1
        bmx.at[i,'branch_type']=spn
    except:
        bmx.at[i,'branch_type']='missing'
        missing+=1
        errored_links.append(j)

print('Iterations: ',count)
print('Missing: ', missing)
2
  • Just to mention, you have "root_product" in screens which you refer as "root_type" in code. Commented Feb 2, 2021 at 22:42
  • yes, was a typo, thanks for pointing that out :) Commented Feb 2, 2021 at 23:21

1 Answer 1

1

Build up a list with indices to be removed, do the job and after iterating all rows remove the unneeded rows. Do not use if/else in loop, simply set all to be missing by start and then set those that have branch type to its values.

bmx=pd.DataFrame({'link_name':["A-B","C-D","B-A","D-C"],
              'root_type':["type1", "type2", "type6", "type1"],
              'branch_type':["","","",""],
             'link_reverse_name':["B-A","D-C","A-B","C-D"]},
                 columns=['link_name','root_type','branch_type','link_reverse_name'])
    
bmx["branch_type"]="missing" #set all to be missing by start, get rid of ifs :)

to_remove = []

for i,j in bmx.iterrows():
    if(i in to_remove):
        continue #just skip if we marked the row for removal already
    link = bmx[bmx.link_name == j.link_reverse_name].root_type.values[0]
    idx = bmx[bmx.link_name == j.link_reverse_name].index
    if link:
        j.branch_type = link
        to_remove.append(idx[0]) #append the index to the list
        
bmx.drop(to_remove, inplace=True)
print(bmx)

We get the desired output:

  link_name root_type branch_type link_reverse_name
0       A-B     type1       type6               B-A
1       C-D     type2       type1               D-C

Of course I expect that all entries are unique, otherwise this will produce some duplicates. I did not use the not problem relevant cols for simplicity.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, that worked just fine. just had some errors as the link_name sometimes is missing, but i think i will just put that in a try statement

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.