0

I have a file with 13 columns and I am looking to perform some grouping tasks. The input looks like so:

A   B   C   D   E   F   G   H   I   J   K   L   M
0   0   0   0   0   0   0   0   0   0   0   0   1
0   0   0   0   0   0   0   0   0   0   0   1   0
0   0   0   0   0   0   0   0   0   0   0   1   1

Excluding column A, the grouping is to be done as follows producing five new columns, the columns J,K,L,M will be merged into one as it is a special case.

A,B > new column D,E > new colum

B  C  Result
1  0  1
0  1  1
1  1  1
0  0  0

If either of the two columns has "1" in it or maybe both, I want to count it as 1. Right now I have written this little snippet but I am not sure how to proceed.

from collections import Counter
with open("datagroup.txt") as inFile:
        print Counter([" ".join(line.split()[::2]) for line in inFile])

* Edit *

A  B&C D&E F&G H&I J,K,L,M
1   1   0   0   1   1
1   1   0   0   0   1
0   1   0   0   1   0
1   0   0   0   0   1
0   1   0   1   1   1
1   0   0   0   0   1

Basically what I want to do is to exclude the first column and then compare every two columns after that until column J, If either column has a "1" present, I want to report that as "1" even if both columns have "1" I would still report that as "1". For the last for columns, namely: J,K,L,M if I see a "1" in either four, it should be reported as "1".

7
  • 4
    it is a bit unclear what output you expect. Commented Aug 12, 2014 at 8:23
  • For one thing, [::2] means every other column, but I can't see anything in your problem statement that has anything to do with every other column. Also, what are you trying to do by counting up strings made up of joining together some of the columns? You want to know how many times "0 0 0 1 0 1 0 1 1 0" occurs? Commented Aug 12, 2014 at 8:27
  • you can use str(int(x)|int(y)) as the joining method but still I don't get what are you talking about Commented Aug 12, 2014 at 8:35
  • 1
    i think you should consider pandas for this kind of thing pandas.pydata.org/pandas-docs/stable Commented Aug 12, 2014 at 8:47
  • 1
    @user3684792: Great suggestion. In Pandas, each combined column is just, e.g., df['B'] | df['C'], with no loop; hard to get more readable than that. (Pure NumPy would also be a step up here, just not quite as much as Pandas.) Commented Aug 12, 2014 at 9:34

1 Answer 1

1

First, you're obviously going to have to iterate over the rows in some way to do something for each row.

Second, I have no idea what what you're trying to do with the [::2], since that will just give you all the even columns, or what the Counter is for in the first place, or why specifically you're trying to count strings made up of a bunch of concatenated columns.

But I think what you want is this:

with open("datagroup.txt") as inFile:
    for row in inFile:
        columns = row.split()
        outcolumns = []
        outcolumns.append(columns[0]) # A
        for group in zip(columns[1:-4:2], columns[2:-4:2])+columns[-4:]:
            outcolumns.append('1' if '1' in group else '0')
        print(' '.join(outcolumns))

You can make this a lot more concise with a bit of itertools and comprehensions, but I wanted to keep this verbose and simple so you'd understand it.

Sign up to request clarification or add additional context in comments.

1 Comment

@Arusekk: I was trying to leave a little bit of work for the OP, but OK, now it's complete. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.