How to Change the Structure of Pandas Dataframe in Python?

Question

I have a current Pandas DataFrame in the format below (see Current DataFrame) but I want to change the structure of it to look like the Desired DataFrame below. The top row of titles is longitudes and the first column of titles is latitudes.

Current DataFrame:

       E0    E1    E2    E3    E4
LAT                              
89   0.01  0.01  0.02  0.01  0.00
88   0.01  0.00  0.00  0.01  0.00
87   0.00  0.02  0.01  0.02  0.01
86   0.02  0.00  0.03  0.02  0.00
85   0.00  0.00  0.00  0.01  0.03

Code to build it:

df = pd.DataFrame({
    'LAT': [89, 88, 87, 86, 85],
    'E0': [0.01, 0.01, 0.0, 0.02, 0.0],
    'E1': [0.01, 0.0, 0.02, 0.0, 0.0],
    'E2': [0.02, 0.0, 0.01, 0.03, 0.0],
    'E3': [0.01, 0.01, 0.02, 0.02, 0.01],
    'E4': [0.0, 0.0, 0.01, 0.0, 0.03]
}).set_index('LAT')

Desired DataFrame:

LAT  LON     R
 89    0  0.01 
 89    1  0.01
 89    2  0.02
 89    3  0.01
 89    4  0.00
 88    0  0.01
 88    1  0.00
 88    2  0.00
 88    3  0.01
 88    4  0.00
 87    0  0.00
 87    1  0.02
 87    2  0.01
 87    3  0.02
 87    4  0.01
 86    0  0.02
 86    1  0.00
 86    2  0.03
 86    3  0.02
 86    4  0.00
 85    0  0.00
 85    1  0.00
 85    2  0.00
 85    3  0.01
 85    4  0.03

Henry Ecker · Accepted Answer · 2021-06-13 01:37:07Z

2

Try with stack + str.extract:

new_df = (
    df.stack()
        .reset_index(name='R')
        .rename(columns={'level_1': 'LON'})
)
new_df['LON'] = new_df['LON'].str.extract(r'(\d+$)').astype(int)

Or with pd.wide_to_long + reindex:

new_df = df.reset_index()
new_df = (
    pd.wide_to_long(new_df, stubnames='E', i='LAT', j='LON')
        .reindex(new_df['LAT'], level=0)
        .rename(columns={'E': 'R'})
        .reset_index()
)

new_df:

    LAT  LON     R
0    89    0  0.01
1    89    1  0.01
2    89    2  0.02
3    89    3  0.01
4    89    4  0.00
5    88    0  0.01
6    88    1  0.00
7    88    2  0.00
8    88    3  0.01
9    88    4  0.00
10   87    0  0.00
11   87    1  0.02
12   87    2  0.01
13   87    3  0.02
14   87    4  0.01
15   86    0  0.02
16   86    1  0.00
17   86    2  0.03
18   86    3  0.02
19   86    4  0.00
20   85    0  0.00
21   85    1  0.00
22   85    2  0.00
23   85    3  0.01
24   85    4  0.03

edited Jun 13, 2021 at 1:37

answered Jun 12, 2021 at 23:36

Henry Ecker♦

35.9k19 gold badges48 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Sjcarlso Over a year ago

I'm getting the following error and I'm not sure why since 'LAT' is in the column title. KeyError Traceback (most recent call last) <ipython-input-16-7cd13af1c681> in <module> 1 new_df = ( ----> 2 df.set_index('LAT') 4 .reset_index(name='R') 5 .rename(columns={'level_1': 'LON'}) KeyError: "None of ['LAT'] are in the columns"

Sjcarlso Over a year ago

Index(['E0', 'E1', 'E2', 'E3', 'E4', 'E5', 'E6', 'E7', 'E8', 'E9',        ...        'E350', 'E351', 'E352', 'E353', 'E354', 'E355', 'E356', 'E357', 'E358',        'E359'],       dtype='object', length=360)

Sjcarlso Over a year ago

Yep you are correct. When I do print(df.index.name) I get LAT.

sammywemmy · Accepted Answer · 2021-06-13 00:25:09Z

You could solve it with pivot_longer from pyjanitor:

# pip install pyjanitor
import janitor
import pandas as pd
df.pivot_longer(index = None,
                names_to = 'LON', 
                values_to = "R", 
                names_pattern = r".(.)", 
                sort_by_appearance = True,
                ignore_index = False).reset_index()

    LAT LON     R
0    89   0  0.01
1    89   1  0.01
2    89   2  0.02
3    89   3  0.01
4    89   4  0.00
5    88   0  0.01
6    88   1  0.00
7    88   2  0.00
8    88   3  0.01
9    88   4  0.00
10   87   0  0.00
11   87   1  0.02
12   87   2  0.01
13   87   3  0.02
14   87   4  0.01
15   86   0  0.02
16   86   1  0.00
17   86   2  0.03
18   86   3  0.02
19   86   4  0.00
20   85   0  0.00
21   85   1  0.00
22   85   2  0.00
23   85   3  0.01
24   85   4  0.03

Here we are only interested in the numbers that are at the end of the columns - we get this by passing a regular expression to names_pattern.

You can avoid pyjanitor altogether by using melt and rename:

(df.rename(columns=lambda col: col[-1])
   .melt(var_name='LON', value_name='R', ignore_index=False)
 )

    LON     R
LAT          
89    0  0.01
88    0  0.01
87    0  0.00
86    0  0.02
85    0  0.00
89    1  0.01
88    1  0.00
87    1  0.02
86    1  0.00
85    1  0.00
89    2  0.02
88    2  0.00
87    2  0.01
86    2  0.03
85    2  0.00
89    3  0.01
88    3  0.01
87    3  0.02
86    3  0.02
85    3  0.01
89    4  0.00
88    4  0.00
87    4  0.01
86    4  0.00
85    4  0.03

Karthik S · Accepted Answer · 2021-06-13 03:36:33Z

1

Another approach, does this work:

pd.wide_to_long(df.reset_index(), ['E'], i = 'LAT', j = 'LON').reset_index().sort_values(by = ['LAT','LON'])
LAT LON E
4   85  0   0.00
9   85  1   0.00
14  85  2   0.00
19  85  3   0.01
24  85  4   0.03
3   86  0   0.02
8   86  1   0.00
13  86  2   0.03
18  86  3   0.02
23  86  4   0.00
2   87  0   0.00
7   87  1   0.02
12  87  2   0.01
17  87  3   0.02
22  87  4   0.01
1   88  0   0.01
6   88  1   0.00
11  88  2   0.00
16  88  3   0.01
21  88  4   0.00
0   89  0   0.01
5   89  1   0.01
10  89  2   0.02
15  89  3   0.01
20  89  4   0.00

answered Jun 13, 2021 at 3:36

Karthik S

11.6k2 gold badges15 silver badges32 bronze badges

Comments

Aaron Wright · Accepted Answer · 2021-06-12 23:46:31Z

0

Quick and dirty.

Pad your LAT with you LON in a list of tuple pairs.

[  
   (89.0, 0.01),
   (89.1, 0.01),
   (89.2, 0.02)
]

Im sure someone can break down a way to organize it like you want... but from what I know you need a unique ID data point for most data in a query structure.

OR:

If you aren't putting this back into a db, then maybe you can use a dict something like this:

{ '89' : { '0' : 0.01,
           '1' : 0.01,
           '2' : 0.02 .....
         }
}

You can then get the data with

dpoint = data['89']['0']

assert dpoint == 0.01
\\ True

dpoint = data['89']['2']

assert dpoint == 0.02
\\ True

answered Jun 12, 2021 at 23:46

Aaron Wright

861 silver badge1 bronze badge

Collectives™ on Stack Overflow

How to Change the Structure of Pandas Dataframe in Python?

4 Answers 4

3 Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related