I have a large excel spreadsheet that I need to read data from certain rows, columns and cells and then output into a different dataframe format. How would I capture the data in specific cells while also ensuring the data can be captured when the spreadsheet is changed? Meaning more columns or rows could be added, but I need to continuously capture this data. Could you provide the code using python and pandas and using loops to dynamically capture this data. Again, not all cells will be used and only certain rows and columns will be used. Here is an example.
Logic
Display the count of the column name for a given quarter and ID. In this case: q1.22. I created new columns called: date and TYPE
Here is the excel spreadsheet:
Data
q1.22
ID type1 OFFICE nontype1 Customer
NY 1 3 1 2
CA 1 33 1 0
TOTALS 2 36 2 1
data = {
'0': ['id', 'NY', 'CA', 'TOTALS'],
'q1.22': ['type1', '1', '1', '2'],
'0_2': ['OFFICE', '3', '33', '36'],
'0_3': ['nontype1', '1', '1', '2'],
'0_4': ['Customer', '2', '0', '1']
}
Desired
ID date TYPE
NY q1.22 type1
NY q1.22 nontype1
NY q1.22 Customer
NY q1.22 Customer
CA q1.22 type1
CA q1.22 nontype1
Doing
# Define the row indices for both ranges
start_row, end_row = 0, 3 # Rows 1 to 4 (0-based index)
# Define the column indices for the first range (A to C)
start_col_range1, end_col_range1 = 0, 2 # Columns A to C (0-based index)
# Define the column indices for the second range (E to F)
start_col_range2, end_col_range2 = 4, 5 # Columns E to F (0-based index)
# Create an empty list to store the captured data
captured_data = []
# Loop through rows and columns within the first range (A to C)
for row in range(start_row, end_row + 1):
row_label = df.iloc[row, 0] # Assuming the ID column is in the first column
for col in range(start_col_range1, end_col_range1 + 1):
col_label = df.columns[col]
value = df.iloc[row, col]
captured_data.append({'ID': row_label, 'date': df.iloc[0, 0], 'TYPE': col_label})
# Loop through rows and columns within the second range (E to F)
for row in range(start_row, end_row + 1):
row_label = df.iloc[row, 0] # Assuming the ID column is in the first column
for col in range(start_col_range2, end_col_range2 + 1):
col_label = df.columns[col]
value = df.iloc[row, col]
captured_data.append({'ID': row_label, 'date': df.iloc[0, 0], 'TYPE': col_label})
# Convert the captured data into a DataFrame
output_df = pd.DataFrame(captured_data)
However, this is the output:
ID date TYPE
0 id id Unnamed: 0
1 id id q1.22
2 NY id Unnamed: 0
3 NY id q1.22
4 CA id Unnamed: 0
5 CA id q1.22
6 TOTALS id Unnamed: 0
7 TOTALS id q1.22
8 id id Unnamed: 3
9 id id Unnamed: 4
10 NY id Unnamed: 3
11 NY id Unnamed: 4
12 CA id Unnamed: 3
13 CA id Unnamed: 4
14 TOTALS id Unnamed: 3
15 TOTALS id Unnamed: 4
Any suggestion is appreciated
stack.