5

I want to read certain column from excel file into dataframe however I want to specify the column with its column header name.

for an example, I have an excel file with two columns in Sheet 2: "number" in column A and "ForeignKey" in column B). I want to import the "ForeignKey" into a dataframe. I did this with the following script:

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols=[0,1]) 

It shows the following in my xl_file:

       number ForeignKey
0       1        abc
1       2        def
2       3        ghi

in case a small number of column, I can get the "ForeignKey" by specifying usecols=[1]. However if I have many column and know the column name pattern, it will be easier by specifying the column name. I tried the following code but it gives empty dataframe.

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols=['ForeignKey']) 

According to discussion in the following link, the code above works well but for read_csv.

[How to drop a specific column of csv file while reading it using pandas?

Is there a way to do this for reading excel file?

thank you in advance

0

2 Answers 2

3

You need to pass excel column name, that too in a format of range e.g. colname:colname.

For instance, if the ForeignKey appears in column B of your excel sheet 2, then do -

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols='B:B') 

Refer to Github issue and prescribed solution for the same.

Sign up to request clarification or add additional context in comments.

3 Comments

@anky_91 I checked with usecols='ForeignKey' also but I received an empty dataframe.
That is the case. I've an excel that contains hundreeds of columns but with date and time naming. Because I know what date and time I want to know, it would be more efficient by specifying the column name, but not the excel column name. I cannot use directy this column header name with read_excel as read_csv can.
@anky_91 I don't think OP knows which column will have foreign key, and making such dictionary of 100 pairs doesn't seem practical.
2

there is a solution but csv are not treated the same way excel does.

from documentation, for csv:

usecols : list-like or callable, default None

For example, a valid list-like usecols parameter would be [0, 1, 2] or [‘foo’, ‘bar’, ‘baz’].

for excel:

usecols : int or list, default None

  • If None then parse all columns,
  • If int then indicates last column to be parsed
  • If list of ints then indicates list of column numbers to be parsed
  • If string then indicates comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”). Ranges are inclusive of both sides

so you need to call it like this:

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols='ForeignKey')

and if you need also 'number':

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols='number,ForeignKey')

EDIT: you need to put the name of the excel column not the name of the data. the other answer solve this. however you won't need 'B:B', 'B' will do the trick BUT that won't improve the usecols with numbers.

if you can load all the datas in not time maybe the best way to solve this is to parse all columns and then select the desired columns:

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2')['ForeignKey']

2 Comments

Alexis It's not the right solution. Did you verify it?
@Alexis, your last suggestion can work for me. I'll accept it for this question. thank you

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.