0

I have a 2 dimensional pandas dataframe that has as the index the values "1, 2, 'NaN', 'NaN', 'NaN', 'NaN'" and the data [10, 20, 30, 40, 50 , 60]. Now I would like to build a numpy array with the dimensionality (3,2). In the first entry of the first dimension of the array, the first two values of the dataframe should be assigned. In the second entry of the first dimension the 3rd and 4th value of the dataframe should be assigned and so on.

So actually the new array should look like this

enter image description here

I tried it with the following code but it did not work, as I get a "KeyError: 0"

import pandas as pd
import numpy as np
d = {'col1': [1, 2, 'NaN', 'NaN', 'NaN', 'NaN'], 'col2': [10, 20, 30, 40, 50 , 60]}

df1 = pd.DataFrame(data=d)
df1 = df1.set_index('col1')

firstDimensionOfTheArray = 3
secondDimensionOfTheArray = 2

array = np.zeros((firstDimensionOfTheArray, secondDimensionOfTheArray))

for i in range (0, firstDimensionOfTheArray):
    for j in range (0,  secondDimensionOfTheArray): 
        array [i, j] = df1 ['col2'] [i * secondDimensionOfTheArray + j]

Do you have any idea, how to do that?

1

2 Answers 2

1

To build the numpy array, use to_numpy and reshape:

df1['col2'].to_numpy().reshape((2,3), order='F')

output:

array([[10, 30, 50],
       [20, 40, 60]])

Now, to make a new dataframe, wrap the above in a DataFrame constructor:

import string
pd.DataFrame(df1['col2'].to_numpy().reshape((2,3), order='F'),
             # the two lines below are only needed if you want
             # the same indexes as in your image
             index=list(df1.index[:2]),
             columns=list(string.ascii_uppercase[:3])
            )

output:

    A   B   C
1  10  30  50
2  20  40  60
Sign up to request clarification or add additional context in comments.

Comments

1

Using the SO answer, can solve the problem of your written code by adding iloc:

array[i, j] = df1['col2'].iloc[i * secondDimensionOfTheArray + j]

or use iat instead of iloc. The solution will get the result for your example:

[[10. 20.]
 [30. 40.]
 [50. 60.]]

Then you can get the desired result by just transposing the array:

array = array.T

output:

[[10. 30. 50.]
 [20. 40. 60.]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.