0

I have an excel file in the following format:

enter image description here

I want to read it using Python and concatenate the tables(the number of tables could change) into a single one, and add a column with the road name next to each table

So it would look like:

enter image description here

I read in the excel file

import pandas as pd df = pd.read_excel(input_fp, dtype='str').dropna(how='all')

And the dataframe looks like:

enter image description here

I'm thinking that splitting the dataframe by columns with all nan values, or columns with a header should work. But unsure how to do this.

Any suggestions would be appreciated

Test data:

{'Unnamed: 0': {0: 'Start Time', 1: '06:01:00', 2: '06:31:00', 3: '07:31:00', 4: '08:31:00'}, 'Unnamed: 1': {0: 'End Time', 1: '06:30:00', 2: '07:30:00', 3: '08:30:00', 4: '09:30:00'}, 'Unnamed: 2': {0: 'Number of Cars', 1: '5343', 2: '2545', 3: '2434', 4: '3424'}, 'Unnamed: 3': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'Unnamed: 4': {0: 'Start Time', 1: '06:01:00', 2: '06:31:00', 3: '07:31:00', 4: '08:31:00'}, 'Unnamed: 5': {0: 'End Time', 1: '06:30:00', 2: '07:30:00', 3: '08:30:00', 4: '09:30:00'}, 'Unnamed: 6': {0: 'Number of Cars', 1: '5343', 2: '2545', 3: '2434', 4: '3424'}, 'Unnamed: 7': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'Unnamed: 8': {0: 'Start Time', 1: '06:01:00', 2: '06:31:00', 3: '07:31:00', 4: '08:31:00'}, 'Unnamed: 9': {0: 'End Time', 1: '06:30:00', 2: '07:30:00', 3: '08:30:00', 4: '09:30:00'}, 'Unnamed: 10': {0: 'Number of Cars', 1: '5343', 2: '2545', 3: '2434', 4: '3424'}}
3
  • 1
    why-should-i-not-upload-images-of-code-data-errors read linked post & provide your df.head().to_dict() by text. Commented Oct 25, 2023 at 2:01
  • 1
    And first, use the following code to specify the header when creating df. df = pd.read_excel(input_fp, header=1, dtype='str').dropna(how='all') Then, if you provide df.head().to_dict() by text, we will tell you how to reshape. Commented Oct 25, 2023 at 2:08
  • @z star I say it again, make df by following code df = pd.read_excel(input_fp, header=1, dtype='str').dropna(how='all') header=1 is important. plz make header & provide df.head().to_dict() Commented Oct 25, 2023 at 2:32

2 Answers 2

2

You can specify header as parameter of read_excel:

df = (pd.read_excel('data.xlsx', header=[0, 2])
        .dropna(how='all', axis=1).rename_axis(columns=['Road', None])
        .stack('Road').reset_index('Road').reset_index(drop=True))

Output:

>>> out
     Road Start Time  End Time  Number of Cars
0  Road A   06:01:00  06:30:00            5343
1  Road B   06:01:00  06:30:00            5343
2  Road C   06:01:00  06:30:00            5343
Sign up to request clarification or add additional context in comments.

Comments

1

So if you don't use header=1 as @PandaKim mentioned in the comments, you first need to move 1st row to the header like this:

df.columns = df.iloc[0]
df.drop(0,axis=0, inplace=True)

Then one way of getting your output would be to loop through the columns and split where the column is nan, and then concat all the splits:

# use this to split
cols_nans = [i for i,x in enumerate(df.columns) if not isinstance(x, str)]
cols_nans.append(len(df.columns))

# init
dfs = []
i = 0
for idx, j in enumerate(cols_nans ):
  df_sub = df.iloc[:, i:j] # split
  df_sub['Road'] = 'Road' + chr(idx+65) # only will work if #roads <26
  i = j+1
  dfs.append(df_sub)
print(pd.concat(dfs))

Output:

0 Start Time  End Time Number of Cars   Road
1   06:01:00  06:30:00           5343  RoadA
2   06:31:00  07:30:00           2545  RoadA
3   07:31:00  08:30:00           2434  RoadA
4   08:31:00  09:30:00           3424  RoadA
1   06:01:00  06:30:00           5343  RoadB
2   06:31:00  07:30:00           2545  RoadB
3   07:31:00  08:30:00           2434  RoadB
4   08:31:00  09:30:00           3424  RoadB
1   06:01:00  06:30:00           5343  RoadC
2   06:31:00  07:30:00           2545  RoadC
3   07:31:00  08:30:00           2434  RoadC
4   08:31:00  09:30:00           3424  RoadC

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.