0

I am trying to check whether values in the column 'DataLineID:' of a workbook exists in columns 0 or 1 in another workbook with pandas. Where I want to have a new column called match added if there is a match.

My code is:

df_comparing = pd.read_excel(fName, sheet_name=0, index_col=None)

df_compare_basis = pd.read_excel(readxl, index_col=None) 

df_comparing['Match'] = df_comparing[df_comparing.columns[0,1]].isin(df_compare_basis['DataLineID:']).astpye(int)

I get the error IndexError: too many indices for array

If I change the following to df_comparing[df_comparing.columns[0]] I get AttributeError: 'Series' object has no attribute 'astpye'

Is there another way to achieve this?

EDIT:

Here is some sample data:

df_comparing = ({'Line no':['AL5176', 'AL5737', 'AL5978'], 'Line addition':[NaN, NaN, NaN, 'AL9876', 'AL6789', 'AL5945']})

df_compare_basis = ({'DataLineID:':['AL5176', 'AL5737', 'AL5978','AL9876', 'AL6789', 'AL5945']})

Output thats wanted:

df_comparing = ({'Line no':['AL5176', 'AL5737', 'AL5978'], 'Line addition':[NaN, NaN, NaN, 'AL9876', 'AL6789', 'AL5945'], 'Match':[1,1,1,1,1,1]})
2
  • 1
    Just give the exact column name or index for this statement df_comparing.columns[0,1] using something like df_comparing.iloc[:,0] Commented Dec 9, 2019 at 8:30
  • 1
    Please provide some sample input data Commented Dec 9, 2019 at 8:33

1 Answer 1

1

I guess this is what you are looking for:

import numpy as np
df_comparing['Match'] = np.where((df_comparing['Line no'].isin(df_compare_basis['DataLineID:'])) | (df_comparing['Line addition'].isin(df_compare_basis['DataLineID:'])), 1, 0)

  Line no Line addition  Match
0  AL5176           NaN      1
1  AL5737           NaN      1
2  AL5978           NaN      1
3     NaN        AL9876      1
4     NaN        AL6789      1
5     NaN        AL5945      1
Sign up to request clarification or add additional context in comments.

3 Comments

Tahnks @luigigi, this though gave a KeyError: 0 in the traceback which i haven't seen before
obviously you have to use your column names you want to compare instead of 0 and 1
Thanks for the help @luigigi , an error is coming up that ValueError: Length of values does not match length of index. As the numpy array length doesn't equal the df length

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.