0

I am reading a csv using pandas csv_reader that uses multiple rows for the headers. I am able to read in the file skip the required rows before the headers and also reset_index to not use the first column as I dont want my data to be the index. The trick is that after reading in the dataframe I need to try and perform two tasks. The top row doesn't repeat header names but is blank when it represents the last field filled in. I would like to "pivot" this header to a column leaving only the second header and filling in the respective blank cells.

an example of input would be like:

   a        b      
  c1 c2 c3 c1 c2 c3
1  0  0  0  0  0  0
2  0  0  0  0  0  0
3  0  0  0  0  0  0
4  0  0  0  0  0  0

what I am trying to output:

enter image description here

0

1 Answer 1

2

Use stack(level=0), then reset your index.

df.stack(level=0).reset_index(level=-1).rename({'level_1': 'cNew'}, axis=1)

Minimal Code Sample

idx = pd.MultiIndex.from_product([['a', 'b'], ['c1', 'c2', 'c3']])
df = pd.DataFrame(0, index=range(1, 5), columns=idx)
df

   a        b      
  c1 c2 c3 c1 c2 c3
1  0  0  0  0  0  0
2  0  0  0  0  0  0
3  0  0  0  0  0  0
4  0  0  0  0  0  0

df.stack(level=0).reset_index(level=-1).rename({'level_1': 'cNew'}, axis=1)

  cNew  c1  c2  c3
1    a   0   0   0
1    b   0   0   0
2    a   0   0   0
2    b   0   0   0
3    a   0   0   0
3    b   0   0   0
4    a   0   0   0
4    b   0   0   0
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks that worked well for the reshaping but is there also a way to infer the top level headers. In your example all three a columns and b columns are already filled in. My source only has the first and then blanks until the next header.
@labowski2944 please provide a reproducible sample of your data. I don't want to keep guessing what your input looks like because that's a waste of our time both.
the example is in the exact format I am describing. There is only a single cell that says "a" followed by 2 blank cells then "b" and two blank cells. In your solution which works fine there is actually data with a, a, a, b, b, b it doesn't actually represent the starting data with missing headers in the top header

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.