Split a numpy array by a key array

Question

I have an numpy array which looks like this:

+----+-------+----------------+
| id | class |  probability   |
+----+-------+----------------+
| 0  |   0   | 0.371301944865 |
| 0  |   1   | 0.317619162391 |
| 0  |   -1  | 0.311078922721 |
| 1  |   0   | 0.401434454687 |
| 1  |   1   | 0.316000976419 |
| 1  |   -1  | 0.282564557522 |
| 2  |   1   | 0.361490456577 |
| 2  |   0   | 0.324832048066 |
| 2  |   -1  | 0.313677512904 |
| .  |   .   | .              |
| .  |   .   | .              |
| .  |   .   | .              |
+----+-------+----------------+

or more formally:

x = numpy.array([[  0.00000000e+00,   0.00000000e+00,   3.71301945e-01],
       [  0.00000000e+00,   1.00000000e+00,   3.17619162e-01],
       [  0.00000000e+00,  -1.00000000e+00,   3.11078923e-01],
       [  1.00000000e+00,   0.00000000e+00,   4.01434455e-01],
       [  1.00000000e+00,   1.00000000e+00,   3.16000976e-01],
       [  1.00000000e+00,  -1.00000000e+00,   2.82564558e-01],
       [  2.00000000e+00,   1.00000000e+00,   3.61490457e-01],
       [  2.00000000e+00,   0.00000000e+00,   3.24832048e-01],
       [  2.00000000e+00,  -1.00000000e+00,   3.13677513e-01]])

As you can see, for each id, I have three classes, each with its probability. I would like to transform this to a four column array like this:

id/class         -1                0                1
0                0.311078922721    0.371301944865   0.317619162391
1                0.282564557522    0.401434454687   0.316000976419
.                .                 .                .
.                .                 .                .
.                .                 .                .

Is there a quick/clean way to do this?!

HYRY · Accepted Answer · 2016-01-18 13:54:39Z

Here is the solution by pandas:

import pandas as pd
import numpy as np

x = np.array([[  0.00000000e+00,   0.00000000e+00,   3.71301945e-01],
       [  0.00000000e+00,   1.00000000e+00,   3.17619162e-01],
       [  0.00000000e+00,  -1.00000000e+00,   3.11078923e-01],
       [  1.00000000e+00,   0.00000000e+00,   4.01434455e-01],
       [  1.00000000e+00,   1.00000000e+00,   3.16000976e-01],
       [  1.00000000e+00,  -1.00000000e+00,   2.82564558e-01],
       [  2.00000000e+00,   1.00000000e+00,   3.61490457e-01],
       [  2.00000000e+00,   0.00000000e+00,   3.24832048e-01],
       [  2.00000000e+00,  -1.00000000e+00,   3.13677513e-01]])

df = pd.DataFrame(x, columns=["id", "class", "p"])
df.pivot(index="id", columns="class", values="p")

output:

class        -1         0         1
id                                 
0      0.311079  0.371302  0.317619
1      0.282565  0.401434  0.316001
2      0.313678  0.324832  0.361490

B. M. · Accepted Answer · 2016-01-18 14:44:14Z

1

concatenate the id with the data : np.hstack((a[:,0][::3][:,None],a[:,2].reshape(-1,3)))

For example:

a=np.array([[i//3,i%3-1,np.random.random()] for i in range (15)])
# a=a[np.argsort(a[:,1])][np.argsort(a[:,0])] #if not sorted
print(a)
id=a[::3,0][:,None]
data =a[:,2].reshape(-1,3)
print(np.hstack((id,data)))

gives

[[ 0.         -1.          0.78556868]
 [ 0.          0.          0.29483601]
 [ 0.          1.          0.74003482]
 [ 1.         -1.          0.00673232]
 [ 1.          0.          0.43262104]
 [ 1.          1.          0.92925208]
 [ 2.         -1.          0.26060377]
 [ 2.          0.          0.21186242]
 [ 2.          1.          0.88388227]
 [ 3.         -1.          0.53816376]
 [ 3.          0.          0.82545746]
 [ 3.          1.          0.53964188]
 [ 4.         -1.          0.63082784]
 [ 4.          0.          0.45693351]
 [ 4.          1.          0.38970428]]

[[ 0.          0.78556868  0.29483601  0.74003482]
 [ 1.          0.00673232  0.43262104  0.92925208]
 [ 2.          0.26060377  0.21186242  0.88388227]
 [ 3.          0.53816376  0.82545746  0.53964188]
 [ 4.          0.63082784  0.45693351  0.38970428]]

pandas can give you nice solutions too.

edited Jan 18, 2016 at 14:44

answered Jan 18, 2016 at 13:31

B. M.

18.7k2 gold badges40 silver badges56 bronze badges

3 Comments

Angelica Over a year ago

Thanks but unfortunately this won't work as the classes are not ordered the same for each id (take a look at the example I provided)!

MB-F Over a year ago

You can easily sort the data with x = x[np.argsort(x[:, 1])] and then x = x[np.argsort(x[:, 0])]. Then you have the data sorted by id and classes and can use reshape. However, I think the Pandas solution is cleaner and simpler if you are willing to use it.

B. M. Over a year ago

edit :thanks to @kazemakase. I add the line for sorting.

Conta · Accepted Answer · 2016-01-18 15:34:28Z

0

You can also use the unstack in pandas

with the same df @HYRY used, add:

df.set_index(["id","class"]).unstack("class").reset_index()

result:

      id         p                    
class         -1.0       0.0       1.0
0      0  0.311079  0.371302  0.317619
1      1  0.282565  0.401434  0.316001
2      2  0.313678  0.324832  0.361490

answered Jan 18, 2016 at 15:34

Conta

5951 gold badge5 silver badges12 bronze badges

Collectives™ on Stack Overflow

Split a numpy array by a key array

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related