Python: Grouping columns and counting

Question

I have a file with 13 columns and I am looking to perform some grouping tasks. The input looks like so:

A   B   C   D   E   F   G   H   I   J   K   L   M
0   0   0   0   0   0   0   0   0   0   0   0   1
0   0   0   0   0   0   0   0   0   0   0   1   0
0   0   0   0   0   0   0   0   0   0   0   1   1

Excluding column A, the grouping is to be done as follows producing five new columns, the columns J,K,L,M will be merged into one as it is a special case.

A,B > new column D,E > new colum

B  C  Result
1  0  1
0  1  1
1  1  1
0  0  0

If either of the two columns has "1" in it or maybe both, I want to count it as 1. Right now I have written this little snippet but I am not sure how to proceed.

from collections import Counter
with open("datagroup.txt") as inFile:
        print Counter([" ".join(line.split()[::2]) for line in inFile])

* Edit *

A  B&C D&E F&G H&I J,K,L,M
1   1   0   0   1   1
1   1   0   0   0   1
0   1   0   0   1   0
1   0   0   0   0   1
0   1   0   1   1   1
1   0   0   0   0   1

Basically what I want to do is to exclude the first column and then compare every two columns after that until column J, If either column has a "1" present, I want to report that as "1" even if both columns have "1" I would still report that as "1". For the last for columns, namely: J,K,L,M if I see a "1" in either four, it should be reported as "1".

For one thing, [::2] means every other column, but I can't see anything in your problem statement that has anything to do with every other column. Also, what are you trying to do by counting up strings made up of joining together some of the columns? You want to know how many times "0 0 0 1 0 1 0 1 1 0" occurs? — abarnert
– abarnert, Commented Aug 12, 2014 at 8:27
you can use str(int(x)|int(y)) as the joining method but still I don't get what are you talking about — Arusekk
– Arusekk, Commented Aug 12, 2014 at 8:35
i think you should consider pandas for this kind of thing pandas.pydata.org/pandas-docs/stable — user3684792
– user3684792, Commented Aug 12, 2014 at 8:47
@user3684792: Great suggestion. In Pandas, each combined column is just, e.g., df['B'] | df['C'], with no loop; hard to get more readable than that. (Pure NumPy would also be a step up here, just not quite as much as Pandas.) — abarnert
– abarnert, Commented Aug 12, 2014 at 9:34

Arusekk · Accepted Answer · 2014-08-12 09:02:15Z

1

First, you're obviously going to have to iterate over the rows in some way to do something for each row.

Second, I have no idea what what you're trying to do with the [::2], since that will just give you all the even columns, or what the Counter is for in the first place, or why specifically you're trying to count strings made up of a bunch of concatenated columns.

But I think what you want is this:

with open("datagroup.txt") as inFile:
    for row in inFile:
        columns = row.split()
        outcolumns = []
        outcolumns.append(columns[0]) # A
        for group in zip(columns[1:-4:2], columns[2:-4:2])+columns[-4:]:
            outcolumns.append('1' if '1' in group else '0')
        print(' '.join(outcolumns))

You can make this a lot more concise with a bit of itertools and comprehensions, but I wanted to keep this verbose and simple so you'd understand it.

edited Aug 12, 2014 at 9:02

Arusekk

8695 silver badges23 bronze badges

answered Aug 12, 2014 at 8:40

abarnert

368k54 gold badges626 silver badges692 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

abarnert Over a year ago

@Arusekk: I was trying to leave a little bit of work for the OP, but OK, now it's complete. :)

Collectives™ on Stack Overflow

Python: Grouping columns and counting

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related