0

I am trying to merge to excel file in Pandas.

import pandas as pd
import numpy as np
upload_raw = pd.read_excel(r'C:\Users\Desktop\Upload Raw Data.xlsx',
                     sheet_name = 'Upload',
                     header = 0,
                     index_col = 0,
                     )
mapping = pd.read_excel(r'C:\Users\Desktop\Mapping.xlsx',
                     sheet_name = 'Mapping',
                     header = 0,
                     index_col = 0,
                     )
display(upload_raw)
display(mapping)
upload_lookup=upload_raw.merge(mapping,on ='BRANCH',how = 'outer' )
display(upload_lookup)

I continue to get the KeyError: 'BRANCH'. I checked the values of the columns all are text. The Mapping file has 3 columns while the upload has around 4 columns.

Upload Raw data

BRANCH  DEPT    CREAT_TS    RAF_IND
AA  &CR     2018-06-22-06.48.49.601000   
03  CUE 2018-06-22-11.43.29.859000   
90  T0L 2018-06-22-11.54.52.633000   

Mapping Data:

BRANCH  UNIT    MASTER
03  MAS CoE
04  NAS ET
05  ET  ET

In the error message these are quite prominent.

 # validate the merge keys dtypes. We may need to coerce

# work-around for merge_asof(right_index=True)
# duplicate columns & possible reduce dimensionality

How do i avoid this issue.

I even tried left_on = 'True', right_on = 'True'

left_key = 'lkey', right_key = 'rkey'. I get the error 'rkey not found

Regards, Ren.

1 Answer 1

1

The main difference seems to be that I don't set 'BRANCH' as the index.

Additionally, mapping 'BRANCH' is imported as int64 because the example only has numbers, whereas upload_raw 'BRANCH' is imported as object.

upload_raw = pd.read_excel('data/2018-09-03_data_mapping.xlsx',
                           sheet_name = 'Upload',
                           header = 0)
mapping = pd.read_excel(r'data/2018-09-03_data_mapping.xlsx',
                        sheet_name = 'Mapping',
                        header = 0)
print(upload_raw)

output:
    BRANCH  DEPT    CREAT_TS    RAF_IND
0   AA  &CR 2018-06-22-06.48.49.601000  NaN
1   3   CUE 2018-06-22-11.43.29.859000  NaN
2   90  T0L 2018-06-22-11.54.52.633000  NaN

mapping['BRANCH'] = mapping['BRANCH'].astype('object')

print(mapping)

output:
    BRANCH  UNIT    MASTER
0   3   MAS CoE
1   4   NAS ET
2   5   ET  ET

upload_lookup=pd.merge(left=upload_raw, right=mapping, on='BRANCH')

print(upload_lookup)

output:
    BRANCH  DEPT    CREAT_TS    RAF_IND UNIT    MASTER
0   3   CUE 2018-06-22-11.43.29.859000  NaN MAS CoE
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks it worked great. I removed the index_col and it worked the object type was not effecting it. Thanks for the solutions

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.