How to use a column header as a row label in a dataframe in Python [duplicate]

Question

I have a poorly structured dataframe that was generated by reading tables in directly from a pdf.

I am trying to manipulate some of the data before putting it into visualization tools.

A key transformation I am trying to make is to extract a column header and use it as a row label. Here is an example of the kind of dataframe I am working with:

data = {'Col1': ['Alabama', 'nan', 'nan', 'nan', 'Wyoming', 'nan', 'nan', 'nan'],
        'Col2': ['nan', 1, 2, 3, 'nan', 1, 2, 3]}

df = pd.DataFrame(data)

The resulting dataframe looks a bit like this:

    Col1    Col2
0   AL  nan
1   nan 1
2   nan 2
3   nan 3
4   WY  nan
5   nan 1
6   nan 2
7   nan 3

Whereby the entries in Col 1 are mostly nan except for those on row 0 (AL) and row 4 (WY). These were effectively subheaders in the table in the pdf.

I am trying to write a code that takes the last valid value in Col1 (e.g., AL) and then fills the remaining rows below it until it encounters the next valid value (e.g., WY).

Correct output would look something like this:

    Col1    Col2
0   AL  nan
1   AL  1
2   AL  2
3   AL  3
4   WY  nan
5   WY  1
6   WY  2
7   WY  3

I am somewhat at a loss for how to proceed here and welcome any advise on where to start out.

Does this answer your question? How to replace NaNs by preceding or next values in pandas DataFrame? Do this after you replace the string 'nan' with pd.NA — pho
– pho, Commented Dec 8, 2022 at 18:17

CodeKorn · Accepted Answer · 2022-12-08 18:10:17Z

2

Do this:

df.Col1.fillna(method='ffill', inplace=True)

answered Dec 8, 2022 at 18:10

CodeKorn

2981 silver badge8 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

luxcem Over a year ago

Probably need to replace "nan" with np.nan before.

pho Over a year ago

If the answer is "use this function that's already part of your library exactly the way the docs show you how to", then the question is generally a duplicate. In such situations, please look for the duplicate and flag/vote to close instead of adding an answer

Shazriki Over a year ago

This works great. I actually replaced the 'nan' with None and worked great. Thank you!

Collectives™ on Stack Overflow

How to use a column header as a row label in a dataframe in Python [duplicate]

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related