1

I have two excel files, both contain employee information. File1 is is 195K rows, File2 is less than 100. I need to return the entire row in File1 where the id# from the File2 is present. I've done something like this in PHP but can't sort it out in python/pandas.

I'm looking at the isin() method to work out the selection of rows.

df0 = pd.ExcelFile('File1.xlsx').parse('Sheet1')
df1 = pd.ExcelFile('Fil2.xlsx').parse('Sheet1')

print df0[df1['staffid'].isin(df0['staffid'])]

The result is "IndexingError: Unalignable boolean Series key provided"

Is pandas the right tool for this, or should I look at openpyxl or something else?

1
  • You can do this via VBA. Would need to know what data looks like. Commented Apr 22, 2016 at 15:40

1 Answer 1

1

Your column order is wrong, it should be:

df0[df0['staffid'].isin(df1['staffid'])]

the error is because df1 length is not the same as df0

You want to find the staffid values in df0 that are present in df1, not the other way around

Sign up to request clarification or add additional context in comments.

3 Comments

Oh. Since I want to return rows from File1 (df0), I thought would want to find the values in df1 that are in df0. as it happens, I get: "Empty DataFrame Columns: [firstname, middleInitial, lastname, staffid, etc...] Index: []" could there be an issue with data types between the two dataframes?
Check the output from df0.info() and df1.info() if needed cast the type using astype(int)
Different data types, that was the issue. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.