Splitting Pandas dataframe by columns and concatenate to create single dataframe

Question

I have an excel file in the following format:

I want to read it using Python and concatenate the tables(the number of tables could change) into a single one, and add a column with the road name next to each table

So it would look like:

I read in the excel file

import pandas as pd df = pd.read_excel(input_fp, dtype='str').dropna(how='all')

And the dataframe looks like:

I'm thinking that splitting the dataframe by columns with all nan values, or columns with a header should work. But unsure how to do this.

Any suggestions would be appreciated

Test data:

{'Unnamed: 0': {0: 'Start Time', 1: '06:01:00', 2: '06:31:00', 3: '07:31:00', 4: '08:31:00'}, 'Unnamed: 1': {0: 'End Time', 1: '06:30:00', 2: '07:30:00', 3: '08:30:00', 4: '09:30:00'}, 'Unnamed: 2': {0: 'Number of Cars', 1: '5343', 2: '2545', 3: '2434', 4: '3424'}, 'Unnamed: 3': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'Unnamed: 4': {0: 'Start Time', 1: '06:01:00', 2: '06:31:00', 3: '07:31:00', 4: '08:31:00'}, 'Unnamed: 5': {0: 'End Time', 1: '06:30:00', 2: '07:30:00', 3: '08:30:00', 4: '09:30:00'}, 'Unnamed: 6': {0: 'Number of Cars', 1: '5343', 2: '2545', 3: '2434', 4: '3424'}, 'Unnamed: 7': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, 'Unnamed: 8': {0: 'Start Time', 1: '06:01:00', 2: '06:31:00', 3: '07:31:00', 4: '08:31:00'}, 'Unnamed: 9': {0: 'End Time', 1: '06:30:00', 2: '07:30:00', 3: '08:30:00', 4: '09:30:00'}, 'Unnamed: 10': {0: 'Number of Cars', 1: '5343', 2: '2545', 3: '2434', 4: '3424'}}

why-should-i-not-upload-images-of-code-data-errors read linked post & provide your df.head().to_dict() by text. — Panda Kim
– Panda Kim, Commented Oct 25, 2023 at 2:01
And first, use the following code to specify the header when creating df. df = pd.read_excel(input_fp, header=1, dtype='str').dropna(how='all') Then, if you provide df.head().to_dict() by text, we will tell you how to reshape. — Panda Kim
– Panda Kim, Commented Oct 25, 2023 at 2:08
@z star I say it again, make df by following code df = pd.read_excel(input_fp, header=1, dtype='str').dropna(how='all') header=1 is important. plz make header & provide df.head().to_dict() — Panda Kim
– Panda Kim, Commented Oct 25, 2023 at 2:32

Corralien · Accepted Answer · 2023-10-25 03:16:25Z

2

You can specify header as parameter of read_excel:

df = (pd.read_excel('data.xlsx', header=[0, 2])
        .dropna(how='all', axis=1).rename_axis(columns=['Road', None])
        .stack('Road').reset_index('Road').reset_index(drop=True))

Output:

>>> out
     Road Start Time  End Time  Number of Cars
0  Road A   06:01:00  06:30:00            5343
1  Road B   06:01:00  06:30:00            5343
2  Road C   06:01:00  06:30:00            5343

answered Oct 25, 2023 at 3:16

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Suraj Shourie · Accepted Answer · 2023-10-25 02:42:42Z

So if you don't use header=1 as @PandaKim mentioned in the comments, you first need to move 1st row to the header like this:

df.columns = df.iloc[0]
df.drop(0,axis=0, inplace=True)

Then one way of getting your output would be to loop through the columns and split where the column is nan, and then concat all the splits:

# use this to split
cols_nans = [i for i,x in enumerate(df.columns) if not isinstance(x, str)]
cols_nans.append(len(df.columns))

# init
dfs = []
i = 0
for idx, j in enumerate(cols_nans ):
  df_sub = df.iloc[:, i:j] # split
  df_sub['Road'] = 'Road' + chr(idx+65) # only will work if #roads <26
  i = j+1
  dfs.append(df_sub)
print(pd.concat(dfs))

Output:

0 Start Time  End Time Number of Cars   Road
1   06:01:00  06:30:00           5343  RoadA
2   06:31:00  07:30:00           2545  RoadA
3   07:31:00  08:30:00           2434  RoadA
4   08:31:00  09:30:00           3424  RoadA
1   06:01:00  06:30:00           5343  RoadB
2   06:31:00  07:30:00           2545  RoadB
3   07:31:00  08:30:00           2434  RoadB
4   08:31:00  09:30:00           3424  RoadB
1   06:01:00  06:30:00           5343  RoadC
2   06:31:00  07:30:00           2545  RoadC
3   07:31:00  08:30:00           2434  RoadC
4   08:31:00  09:30:00           3424  RoadC

Collectives™ on Stack Overflow

Splitting Pandas dataframe by columns and concatenate to create single dataframe

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related