0

I have an xlsx where the first 9 rows are headers. Row 1 contains a name, like "Bob" and "Alice".

Row 4 contains either 'Monthly' or 'Quarterly'.

Sometimes there are two fields called 'Bob' but one has 'Monthly' and the other has 'Quarterly' in row 4.

I understand I could read in the column called 'Bob' into a dataframe, but is there a way to specify which one should be loaded into the dataframe?

e.g. below I have bob and alice, and as it stands I would read in 2 Bob fields and 2 Alice fields. Is there a way of reducing these somehow on the initial readthrough?

import pandas as pd
fields = ['Bob', 'Alice']
type = ['Monthly','Quarterly']


df = pd.read_excel('data.xlsx', sheet='Sheet1', usecols=fields)
# See the keys
print df.keys()
# See content in 'Bob'
print df.bob

Alternatively, is there a way I can read all 4 columns - Bob and Alice - and then only keep the one I want (e.g. monthly for Bob, quarterly for Alice)?

Example xlsx file is as follows (formatted as a csv to make it look nicer here though):

Mnemonic:,Alice,Bob,Mnemonic:,Alice,Bob
Description:,Test results for Alice,Test results for Bob,Description:,Test results for Alice,Test results for Bob
Source:,(na),(na),Source:,(na),(na)
Native Frequency:,Monthly,Monthly,Native Frequency:,Quarterly,Quarterly
Transformation:,None,None,Transformation:,None,None
Begin Date:,10/31/2006,10/31/2006,Begin Date:,09/30/2006,09/30/2006
Last Updated:,,,Last Updated:,,
Historical End Date:,12/30/2017,12/30/2017,Historical End Date:,12/30/2017,12/30/2017
Geography:,(na),(na),Geography:,(na),(na)
10/31/2006,3,2,09/30/2006,3,2
11/30/2006,3,2,12/31/2006,5,1
12/31/2006,3,2,03/31/2007,7,4
01/31/2007,5,1,06/30/2007,8,7
02/28/2007,5,1,09/30/2007,1,2
03/31/2007,5,1,12/31/2007,6,9
04/30/2007,7,4,03/31/2008,1,5
05/31/2007,7,4,06/30/2008,9,7
06/30/2007,7,4,09/30/2008,9,2
07/31/2007,8,7,12/31/2008,8,7
08/31/2007,8,7,03/31/2009,5,8
09/30/2007,8,7,06/30/2009,3,6

1 Answer 1

0

There isn't an option to filter the rows before the Excel file is loaded into a pandas object.

If your file was in csv format, you could iterate through chunks of the csv file and perform the filtering for each chunk. After this, you would then aggregate the chunks into one dataframe. See this answer for details.

Sign up to request clarification or add additional context in comments.

4 Comments

What about filtering after it's in a pandas object? Can I load ALL the rows (including the headers), then delete the columns I don't want?
@Aaraeus, of course! would you mind amending your question accordingly? that way other users can find a problem-solution combo easily.
@Aaraeus, at the same time please provide a sample of the data so we can test.
Just did both. Thanks in advance!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.