Multiple binary columns to one column

Question

I have a CSV file dataset that contains 21 columns, the first 10 columns are numbers and I don't want to change them. The next 10 columns are binary data and contain only 1 and 0 in it, one "1" and the others are "0", and the last column is the given label.

the example data looks like below

2596,51,3,258,0,510,221,232,148,6279,24(10th column),0,0,0,0,0,1(16th column),0,0,0,0,2(the last column)

Suppose I load the data into a matrix, can I keep the first 10 columns and the last column unchanged, and convert the middle 10 columns into one column? After transformation, I want the column value to be based on the index of the "1" in the row, like the row above, the wanted result is

2596,51,3,258,0,510,221,232,148,6279,24,6(it's 6 because the "1" is on 6th column of the binary data),2 #12 columns in total

Can I achieve this using NumPy, scikit-learn or something else?

Daniel F · Accepted Answer · 2017-05-17 08:07:23Z

2

This should do it if it is loaded into a numpy array

out = np.c_[in[:, :11], np.where(in[:, 11:-1])[1] + 1, in[:, -1]]

edited May 17, 2017 at 8:07

answered May 16, 2017 at 11:30

Daniel F

14.5k2 gold badges34 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

toblKr Over a year ago

I use this on the data set and I get "ValueError: all the input arrays must have same number of dimensions", after that I create a small numpy array like "b=array([[11, 22, 0, 1, 0, 2], [22, 33, 1, 0, 0, 1]])" and use "out = np.r_[b[:,:2],np.where(b[:,2:-1])[1]+1,b[:,-1]]" I also get this error. Did I wrongly set the parameters? Thank you.

Daniel F Over a year ago

Woops, should have used np.c_ not np.r_. Fixed

tuomastik · Accepted Answer · 2017-05-16 11:56:26Z

1

from io import StringIO

import pandas as pd

csv = StringIO("2596,51,3,258,0,510,221,232,148,6279,24,0,0,0,0,0,1,0,0,0,0,2"
               "\n1,2,3,4,5,6,7,8,9,10,11,0,0,0,0,1,0,0,0,0,0,1")

df = pd.read_csv(csv, header=None)

df = pd.concat(objs=[df[df.columns[:11]],
                     df[df.columns[11:-1]].idxmax(axis=1) - 10,
                     df[df.columns[-1]]], axis=1)

print(df)

Output:

     0   1   2    3   4    5    6    7    8     9   10  0   21
0  2596  51   3  258   0  510  221  232  148  6279  24   6   2
1     1   2   3    4   5    6    7    8    9    10  11   5   1

answered May 16, 2017 at 11:56

tuomastik

4,9856 gold badges46 silver badges52 bronze badges

Comments

MaxU - stand with Ukraine · Accepted Answer · 2017-05-16 11:26:45Z

Data:

In [135]: df
Out[135]:
     0   1   2    3   4    5    6    7    8     9  ...  12  13  14  15  16  17  18  19  20  21
0  2596  51   3  258   0  510  221  232  148  6279 ...   0   0   0   0   1   0   0   0   0   2
1  2596  51   3  258   0  510  221  232  148  6279 ...   0   0   0   0   0   0   0   0   1   2

[2 rows x 22 columns]

Solution:

df = pd.read_csv('/path/to/file.csv', header=None)

In [137]: df.iloc[:, :11] \
            .join(df.iloc[:, 11:21].dot(range(1,11)).to_frame(11)) \
            .join(df.iloc[:, -1])
Out[137]:
     0   1   2    3   4    5    6    7    8     9   10  11  21
0  2596  51   3  258   0  510  221  232  148  6279  24   6   2
1  2596  51   3  258   0  510  221  232  148  6279  24  10   2

Allen Qin · Accepted Answer · 2017-05-16 11:41:59Z

Setup

df = pd.DataFrame({0: {2596: 51},
 1: {2596: 3},
 2: {2596: 258},
 3: {2596: 0},
 4: {2596: 510},
 5: {2596: 221},
 6: {2596: 232},
 7: {2596: 148},
 8: {2596: 6279},
 9: {2596: 24},
 10: {2596: 0},
 11: {2596: 0},
 12: {2596: 0},
 13: {2596: 0},
 14: {2596: 0},
 15: {2596: 1},
 16: {2596: 0},
 17: {2596: 0},
 18: {2596: 0},
 19: {2596: 0},
 20: {2596: 2}})

Solution

#find the index of the column with value 1 within the 10 columns
df.iloc[:,10] = np.argmax(df.iloc[:,10:20].values,axis=1)+1

#select the first 10 columns, the position column and the label column
df.iloc[:,list(range(11))+[20]]

Out[2167]: 
      0   1    2   3    4    5    6    7     8   9   10  20
2596  51   3  258   0  510  221  232  148  6279  24   6   2

Collectives™ on Stack Overflow

Multiple binary columns to one column

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related