0

I have an numpy array which looks like this:

+----+-------+----------------+
| id | class |  probability   |
+----+-------+----------------+
| 0  |   0   | 0.371301944865 |
| 0  |   1   | 0.317619162391 |
| 0  |   -1  | 0.311078922721 |
| 1  |   0   | 0.401434454687 |
| 1  |   1   | 0.316000976419 |
| 1  |   -1  | 0.282564557522 |
| 2  |   1   | 0.361490456577 |
| 2  |   0   | 0.324832048066 |
| 2  |   -1  | 0.313677512904 |
| .  |   .   | .              |
| .  |   .   | .              |
| .  |   .   | .              |
+----+-------+----------------+

or more formally:

x = numpy.array([[  0.00000000e+00,   0.00000000e+00,   3.71301945e-01],
       [  0.00000000e+00,   1.00000000e+00,   3.17619162e-01],
       [  0.00000000e+00,  -1.00000000e+00,   3.11078923e-01],
       [  1.00000000e+00,   0.00000000e+00,   4.01434455e-01],
       [  1.00000000e+00,   1.00000000e+00,   3.16000976e-01],
       [  1.00000000e+00,  -1.00000000e+00,   2.82564558e-01],
       [  2.00000000e+00,   1.00000000e+00,   3.61490457e-01],
       [  2.00000000e+00,   0.00000000e+00,   3.24832048e-01],
       [  2.00000000e+00,  -1.00000000e+00,   3.13677513e-01]])

As you can see, for each id, I have three classes, each with its probability. I would like to transform this to a four column array like this:

id/class         -1                0                1
0                0.311078922721    0.371301944865   0.317619162391
1                0.282564557522    0.401434454687   0.316000976419
.                .                 .                .
.                .                 .                .
.                .                 .                .

Is there a quick/clean way to do this?!

3 Answers 3

3

Here is the solution by pandas:

import pandas as pd
import numpy as np

x = np.array([[  0.00000000e+00,   0.00000000e+00,   3.71301945e-01],
       [  0.00000000e+00,   1.00000000e+00,   3.17619162e-01],
       [  0.00000000e+00,  -1.00000000e+00,   3.11078923e-01],
       [  1.00000000e+00,   0.00000000e+00,   4.01434455e-01],
       [  1.00000000e+00,   1.00000000e+00,   3.16000976e-01],
       [  1.00000000e+00,  -1.00000000e+00,   2.82564558e-01],
       [  2.00000000e+00,   1.00000000e+00,   3.61490457e-01],
       [  2.00000000e+00,   0.00000000e+00,   3.24832048e-01],
       [  2.00000000e+00,  -1.00000000e+00,   3.13677513e-01]])

df = pd.DataFrame(x, columns=["id", "class", "p"])
df.pivot(index="id", columns="class", values="p")

output:

class        -1         0         1
id                                 
0      0.311079  0.371302  0.317619
1      0.282565  0.401434  0.316001
2      0.313678  0.324832  0.361490
Sign up to request clarification or add additional context in comments.

Comments

1

concatenate the id with the data : np.hstack((a[:,0][::3][:,None],a[:,2].reshape(-1,3)))

For example:

a=np.array([[i//3,i%3-1,np.random.random()] for i in range (15)])
# a=a[np.argsort(a[:,1])][np.argsort(a[:,0])] #if not sorted
print(a)
id=a[::3,0][:,None]
data =a[:,2].reshape(-1,3)
print(np.hstack((id,data))) 

gives

[[ 0.         -1.          0.78556868]
 [ 0.          0.          0.29483601]
 [ 0.          1.          0.74003482]
 [ 1.         -1.          0.00673232]
 [ 1.          0.          0.43262104]
 [ 1.          1.          0.92925208]
 [ 2.         -1.          0.26060377]
 [ 2.          0.          0.21186242]
 [ 2.          1.          0.88388227]
 [ 3.         -1.          0.53816376]
 [ 3.          0.          0.82545746]
 [ 3.          1.          0.53964188]
 [ 4.         -1.          0.63082784]
 [ 4.          0.          0.45693351]
 [ 4.          1.          0.38970428]]

[[ 0.          0.78556868  0.29483601  0.74003482]
 [ 1.          0.00673232  0.43262104  0.92925208]
 [ 2.          0.26060377  0.21186242  0.88388227]
 [ 3.          0.53816376  0.82545746  0.53964188]
 [ 4.          0.63082784  0.45693351  0.38970428]]

pandas can give you nice solutions too.

3 Comments

Thanks but unfortunately this won't work as the classes are not ordered the same for each id (take a look at the example I provided)!
You can easily sort the data with x = x[np.argsort(x[:, 1])] and then x = x[np.argsort(x[:, 0])]. Then you have the data sorted by id and classes and can use reshape. However, I think the Pandas solution is cleaner and simpler if you are willing to use it.
edit :thanks to @kazemakase. I add the line for sorting.
0

You can also use the unstack in pandas

with the same df @HYRY used, add:

df.set_index(["id","class"]).unstack("class").reset_index()

result:

      id         p                    
class         -1.0       0.0       1.0
0      0  0.311079  0.371302  0.317619
1      1  0.282565  0.401434  0.316001
2      2  0.313678  0.324832  0.361490

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.