4

I have a two dimensional numpy array:

arr = np.array([[1,2,3],[4,5,6],[7,8,9]])

How would I go about converting this into a pandas data frame that would have the x coordinate, y coordinate, and corresponding array value at that index into a pandas data frame like this:

x   y    val
0   0    1
0   1    4
0   2    7
1   0    2
1   1    5
1   2    8
...

2 Answers 2

12

With stack and reset index:

df = pd.DataFrame(arr).stack().rename_axis(['y', 'x']).reset_index(name='val')
df

Out: 
   y  x  val
0  0  0    1
1  0  1    2
2  0  2    3
3  1  0    4
4  1  1    5
5  1  2    6
6  2  0    7
7  2  1    8
8  2  2    9

If ordering is important:

df.sort_values(['x', 'y'])[['x', 'y', 'val']].reset_index(drop=True)
Out: 
   x  y  val
0  0  0    1
1  0  1    4
2  0  2    7
3  1  0    2
4  1  1    5
5  1  2    8
6  2  0    3
7  2  1    6
8  2  2    9
Sign up to request clarification or add additional context in comments.

1 Comment

thanks - this was very helpful! any chance you could generalize this for N dimensions?
1

Here's a NumPy method -

>>> arr
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
>>> shp = arr.shape
>>> r,c = np.indices(shp)
>>> pd.DataFrame(np.c_[r.ravel(), c.ravel(), arr.ravel('F')], \
                                columns=((['x','y','val'])))

   x  y  val
0  0  0    1
1  0  1    4
2  0  2    7
3  1  0    2
4  1  1    5
5  1  2    8
6  2  0    3
7  2  1    6
8  2  2    9

4 Comments

arr.shape is (3, 3) so your x0 row is [1, 2, 3]. But your code has the first column instead of the row (from your output, x0 = [1, 4, 7] ...I know the problem statement shows the [1, 4, 7] in the desired output, but it's not the actual correct coordinates (@ayhan got it right)
@WaveRider Sorry, I am not really sure what you are trying to say. Could you try again? The output is same as with ayhan and as also with the expected output in the question.
Sure, please look at the original array definition: arr = np.array([[1,2,3],[4,5,6],[7,8,9]]) - the first row in that array is [1,2,3] and the corresponding coordinate x,y pairs are then x0,y0 = 1, x0,y1 = 2, and x0,y2 = 3. After the transformation you imposed, those mappings actually change to x0,y0 = 1, x0,y1 = 4, and x0,y2 = 7, thus loosing the integrity of the original coordinate mappings (the problem statement describes properly mapping x,y coordinates, however the displayed desired outcome in the problem statement shows the incorrect mapping just described above). @ayhan showed both
@WaveRider I am going by the expected output there, which it seems assumes Y is along the rows of the input array (i.e. Y is 0 for first row, 1 for second row and so on) and X along the columns. So, for the first row Y won't change and it would be 0. So, for arr = np.array([[1,2,3],[4,5,6],[7,8,9]]), we should have : x0,y0 = 1, x1,y0 = 2, and x2,y0 = 3. If you want the swapped axes output as ayhan showed alternatively, simply drop the fortran ordering for arr in my solution, i.e. do : pd.DataFrame(np.c_[r.ravel(), c.ravel(), arr.ravel()... Hope that clarifies.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.