I have a file with 13 columns and I am looking to perform some grouping tasks. The input looks like so:
A B C D E F G H I J K L M
0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 1 1
Excluding column A, the grouping is to be done as follows producing five new columns, the columns J,K,L,M will be merged into one as it is a special case.
A,B > new column D,E > new colum
B C Result
1 0 1
0 1 1
1 1 1
0 0 0
If either of the two columns has "1" in it or maybe both, I want to count it as 1. Right now I have written this little snippet but I am not sure how to proceed.
from collections import Counter
with open("datagroup.txt") as inFile:
print Counter([" ".join(line.split()[::2]) for line in inFile])
* Edit *
A B&C D&E F&G H&I J,K,L,M
1 1 0 0 1 1
1 1 0 0 0 1
0 1 0 0 1 0
1 0 0 0 0 1
0 1 0 1 1 1
1 0 0 0 0 1
Basically what I want to do is to exclude the first column and then compare every two columns after that until column J, If either column has a "1" present, I want to report that as "1" even if both columns have "1" I would still report that as "1". For the last for columns, namely: J,K,L,M if I see a "1" in either four, it should be reported as "1".
[::2]means every other column, but I can't see anything in your problem statement that has anything to do with every other column. Also, what are you trying to do by counting up strings made up of joining together some of the columns? You want to know how many times"0 0 0 1 0 1 0 1 1 0"occurs?str(int(x)|int(y))as the joining method but still I don't get what are you talking aboutdf['B'] | df['C'], with no loop; hard to get more readable than that. (Pure NumPy would also be a step up here, just not quite as much as Pandas.)