1

I have two pandas dataframes a_df and b_df. a_df has columns ID, atext, and var1-var25, while b_df has columns ID, atext, and var1-var 25.

I want to add ONLY the corresponding vars from a_df and b_df and leave ID, and atext alone.

The code below adds ALL the corresponding columns. Is there a way to get it to add just the columns of interest?

absum_df=a_df.add(b_df)

What could I do to achieve this?

2 Answers 2

2

Use filter:

absum_df = a_df.filter(like='var').add(b_df.filter(like='var'))

If you want to keep additional columns as-is, use concat after summing:

absum_df = pd.concat([a_df[['ID', 'atext']], absum_df], axis=1)

Alternatively, instead of subselecting columns from a_df, you could instead just drop the columns in absum_df, if you want to add all columns from a_df not in absum_df:

absum_df = pd.concat([a_df.drop(absum_df.columns axis=1), absum_df], axis=1)
Sign up to request clarification or add additional context in comments.

3 Comments

The OP asked to "leave ID, and atext alone". Presumably, that means keep those columns from a_df. Your answer simply gets rid of those columns.
right. it works, but forgets about my ID and atext. thank you for noting that.
see the edit for adding additional columns after the fact
1

You can subset a dataframe to particular columns:

var_columns = ['var-{}'.format(i) for i in range(1,26)]
absum_df=a_df[var_columns].add(b_df[var_columns])

Note that this will result in a dataframe with only the var columns. If you want a dataframe with the non-var columns from a_df, and the var columns being the sum of a_df and b_df, you can do

absum_df = a_df.copy()
absum_df[var_columns] = a_df[var_columns].add(b_df[var_columns])

4 Comments

so this is something always driving me crazy in python - suppose the var columns have names var1, var2, var3, ...,varn. HOW do you tell python this in a shorthand without typing all of them one by one? in the example above with "var_columns" is there a way to use some kind of shorthand instead of typing each column? sometimes there are HUNDREDS of cols labeled sequentially and it seems so inefficient to brute force type them!
In the example from @root, I was able to use the like="var" and it perfectly summed all the corresponding "var" columns. But in the second example, I will have to type all the col names out.
@profhoff I forgot to put the format in my code. See my edited version.
Instead of using a list comprehension with string formatting to generate the columns, you can just do var_columns = a_df.filter(like='var').columns

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.