0

I have a big excel (137MB), where I have 50 consumers and each of them have a separate excel sheet. Trying not to create 50 variables I would like to know how I can take from each sheet a specific column. For example, I want to save in a different variable the values in row 3. Or if it is easier save them in a variable and then taking into account the number of rows dividing it into 50 variables.

2
  • 2
    You can push them into some container like list Commented Aug 5, 2022 at 15:19
  • don't use different variables but list or dictionary - and then you can use for-loop to work with all values Commented Aug 5, 2022 at 16:59

2 Answers 2

2

Someone recommended Pandas, but I have used openpyxl to do similar things.

First you can import openpyxl and setup the worksheet like this:

import openpyxl #openpyxl will need to be installed via pip
workbook = openpyxl.load_workbook(PATHTOYOURWORKSHEET)

to access specific sheets of the workbook, as you mentioned each consumer has a seperate sheet, you do this:

worksheet1 = workbook["THE NAME OF YOUR SHEET"]
#you can call each whatever you want, this is how you get into a specific sheet.

Then to select a specific column you could do something like this although this is for a row, not a column, but if you read openpyxl docs i think its just iter_col instead, which just iterates every single cell in the row under your parameters:

sheet1Row2 = worksheet1.iter_rows(min_row=2, max_row=2, min_col=1, max_col=sheet1.max_column, values_only=True)

then using this, you could do something like for each 'sheet' in the workbook, target the row you want and then do whatever you want with it, add it to an array, save each to a new variable, etc.

Hope it helps

Sign up to request clarification or add additional context in comments.

Comments

1

you can read the entire dataframe without specifying the sheet which will give you a dictionary dataframe.

you can then use a list comprehension and loop.

MCVE

import pandas as pd 

df1 = pd.get_dummies(list('ABCD'))

df2 = df1
df2['B'] = 5

writer = pd.ExcelWriter('workbook.xlsx')
df1.to_excel(writer,sheet_name='test1')
df2.to_excel(writer,sheet_name='test2')
writer.save()

print(df1)

   A  B  C  D
0  1  0  0  0
1  0  1  0  0
2  0  0  1  0
3  0  0  0  1

print(df2)

   A  B  C  D
0  1  5  0  0
1  0  5  0  0
2  0  5  1  0
3  0  5  0  1

df = pd.read_excel('workbook.xlsx',sheet_name=None)

print(df)

{'test1':    Unnamed: 0  A  B  C  D
0           0  1  5  0  0
1           1  0  5  0  0
2           2  0  5  1  0
3           3  0  5  0  1, 'test2':    Unnamed: 0  A  B  C  D
0           0  1  5  0  0
1           1  0  5  0  0
2           2  0  5  1  0
3           3  0  5  0  1}


new = pd.concat(df.values())[["A", "C"]]

print(new)

   A  C
0  1  0
1  0  0
2  0  1
3  0  0
0  1  0
1  0  0
2  0  1
3  0  0

2 Comments

Umar.H, do you know how to only have the values in the array. So, not having A and C such as 01230123. The array should just have in your example([[1,0,0,0,1,0,0,0,], [0,0,1,0,0,0,1,0]]
sorry @Tams I don't understand

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.