2

I have a current Pandas DataFrame in the format below (see Current DataFrame) but I want to change the structure of it to look like the Desired DataFrame below. The top row of titles is longitudes and the first column of titles is latitudes.

Current DataFrame:

       E0    E1    E2    E3    E4
LAT                              
89   0.01  0.01  0.02  0.01  0.00
88   0.01  0.00  0.00  0.01  0.00
87   0.00  0.02  0.01  0.02  0.01
86   0.02  0.00  0.03  0.02  0.00
85   0.00  0.00  0.00  0.01  0.03

Code to build it:

df = pd.DataFrame({
    'LAT': [89, 88, 87, 86, 85],
    'E0': [0.01, 0.01, 0.0, 0.02, 0.0],
    'E1': [0.01, 0.0, 0.02, 0.0, 0.0],
    'E2': [0.02, 0.0, 0.01, 0.03, 0.0],
    'E3': [0.01, 0.01, 0.02, 0.02, 0.01],
    'E4': [0.0, 0.0, 0.01, 0.0, 0.03]
}).set_index('LAT')

Desired DataFrame:

LAT  LON     R
 89    0  0.01 
 89    1  0.01
 89    2  0.02
 89    3  0.01
 89    4  0.00
 88    0  0.01
 88    1  0.00
 88    2  0.00
 88    3  0.01
 88    4  0.00
 87    0  0.00
 87    1  0.02
 87    2  0.01
 87    3  0.02
 87    4  0.01
 86    0  0.02
 86    1  0.00
 86    2  0.03
 86    3  0.02
 86    4  0.00
 85    0  0.00
 85    1  0.00
 85    2  0.00
 85    3  0.01
 85    4  0.03

4 Answers 4

2

Try with stack + str.extract:

new_df = (
    df.stack()
        .reset_index(name='R')
        .rename(columns={'level_1': 'LON'})
)
new_df['LON'] = new_df['LON'].str.extract(r'(\d+$)').astype(int)

Or with pd.wide_to_long + reindex:

new_df = df.reset_index()
new_df = (
    pd.wide_to_long(new_df, stubnames='E', i='LAT', j='LON')
        .reindex(new_df['LAT'], level=0)
        .rename(columns={'E': 'R'})
        .reset_index()
)

new_df:

    LAT  LON     R
0    89    0  0.01
1    89    1  0.01
2    89    2  0.02
3    89    3  0.01
4    89    4  0.00
5    88    0  0.01
6    88    1  0.00
7    88    2  0.00
8    88    3  0.01
9    88    4  0.00
10   87    0  0.00
11   87    1  0.02
12   87    2  0.01
13   87    3  0.02
14   87    4  0.01
15   86    0  0.02
16   86    1  0.00
17   86    2  0.03
18   86    3  0.02
19   86    4  0.00
20   85    0  0.00
21   85    1  0.00
22   85    2  0.00
23   85    3  0.01
24   85    4  0.03
Sign up to request clarification or add additional context in comments.

3 Comments

I'm getting the following error and I'm not sure why since 'LAT' is in the column title. KeyError Traceback (most recent call last) <ipython-input-16-7cd13af1c681> in <module> 1 new_df = ( ----> 2 df.set_index('LAT') 4 .reset_index(name='R') 5 .rename(columns={'level_1': 'LON'}) KeyError: "None of ['LAT'] are in the columns"
Index(['E0', 'E1', 'E2', 'E3', 'E4', 'E5', 'E6', 'E7', 'E8', 'E9', ... 'E350', 'E351', 'E352', 'E353', 'E354', 'E355', 'E356', 'E357', 'E358', 'E359'], dtype='object', length=360)
Yep you are correct. When I do print(df.index.name) I get LAT.
1

You could solve it with pivot_longer from pyjanitor:

# pip install pyjanitor
import janitor
import pandas as pd
df.pivot_longer(index = None,
                names_to = 'LON', 
                values_to = "R", 
                names_pattern = r".(.)", 
                sort_by_appearance = True,
                ignore_index = False).reset_index()

    LAT LON     R
0    89   0  0.01
1    89   1  0.01
2    89   2  0.02
3    89   3  0.01
4    89   4  0.00
5    88   0  0.01
6    88   1  0.00
7    88   2  0.00
8    88   3  0.01
9    88   4  0.00
10   87   0  0.00
11   87   1  0.02
12   87   2  0.01
13   87   3  0.02
14   87   4  0.01
15   86   0  0.02
16   86   1  0.00
17   86   2  0.03
18   86   3  0.02
19   86   4  0.00
20   85   0  0.00
21   85   1  0.00
22   85   2  0.00
23   85   3  0.01
24   85   4  0.03

Here we are only interested in the numbers that are at the end of the columns - we get this by passing a regular expression to names_pattern.

You can avoid pyjanitor altogether by using melt and rename:

(df.rename(columns=lambda col: col[-1])
   .melt(var_name='LON', value_name='R', ignore_index=False)
 )

    LON     R
LAT          
89    0  0.01
88    0  0.01
87    0  0.00
86    0  0.02
85    0  0.00
89    1  0.01
88    1  0.00
87    1  0.02
86    1  0.00
85    1  0.00
89    2  0.02
88    2  0.00
87    2  0.01
86    2  0.03
85    2  0.00
89    3  0.01
88    3  0.01
87    3  0.02
86    3  0.02
85    3  0.01
89    4  0.00
88    4  0.00
87    4  0.01
86    4  0.00
85    4  0.03

1 Comment

oops. my bad. thanks for the update @HenryEcker
1

Another approach, does this work:

pd.wide_to_long(df.reset_index(), ['E'], i = 'LAT', j = 'LON').reset_index().sort_values(by = ['LAT','LON'])
LAT LON E
4   85  0   0.00
9   85  1   0.00
14  85  2   0.00
19  85  3   0.01
24  85  4   0.03
3   86  0   0.02
8   86  1   0.00
13  86  2   0.03
18  86  3   0.02
23  86  4   0.00
2   87  0   0.00
7   87  1   0.02
12  87  2   0.01
17  87  3   0.02
22  87  4   0.01
1   88  0   0.01
6   88  1   0.00
11  88  2   0.00
16  88  3   0.01
21  88  4   0.00
0   89  0   0.01
5   89  1   0.01
10  89  2   0.02
15  89  3   0.01
20  89  4   0.00

Comments

0

Quick and dirty.

Pad your LAT with you LON in a list of tuple pairs.

[  
   (89.0, 0.01),
   (89.1, 0.01),
   (89.2, 0.02)
]

Im sure someone can break down a way to organize it like you want... but from what I know you need a unique ID data point for most data in a query structure.

OR:

If you aren't putting this back into a db, then maybe you can use a dict something like this:

{ '89' : { '0' : 0.01,
           '1' : 0.01,
           '2' : 0.02 .....
         }
}

You can then get the data with

dpoint = data['89']['0']

assert dpoint == 0.01
\\ True

dpoint = data['89']['2']

assert dpoint == 0.02
\\ True

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.