add selected columns from two pandas dfs

Question

I have two pandas dataframes a_df and b_df. a_df has columns ID, atext, and var1-var25, while b_df has columns ID, atext, and var1-var 25.

I want to add ONLY the corresponding vars from a_df and b_df and leave ID, and atext alone.

The code below adds ALL the corresponding columns. Is there a way to get it to add just the columns of interest?

absum_df=a_df.add(b_df)

What could I do to achieve this?

root · Accepted Answer · 2018-05-02 22:41:35Z

2

Use filter:

absum_df = a_df.filter(like='var').add(b_df.filter(like='var'))

If you want to keep additional columns as-is, use concat after summing:

absum_df = pd.concat([a_df[['ID', 'atext']], absum_df], axis=1)

Alternatively, instead of subselecting columns from a_df, you could instead just drop the columns in absum_df, if you want to add all columns from a_df not in absum_df:

absum_df = pd.concat([a_df.drop(absum_df.columns axis=1), absum_df], axis=1)

edited May 2, 2018 at 22:41

answered May 2, 2018 at 22:04

root

34.1k6 gold badges77 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Acccumulation Over a year ago

The OP asked to "leave ID, and atext alone". Presumably, that means keep those columns from a_df. Your answer simply gets rid of those columns.

profhoff Over a year ago

right. it works, but forgets about my ID and atext. thank you for noting that.

root Over a year ago

see the edit for adding additional columns after the fact

Acccumulation · Accepted Answer · 2018-05-02 22:44:14Z

1

You can subset a dataframe to particular columns:

var_columns = ['var-{}'.format(i) for i in range(1,26)]
absum_df=a_df[var_columns].add(b_df[var_columns])

Note that this will result in a dataframe with only the var columns. If you want a dataframe with the non-var columns from a_df, and the var columns being the sum of a_df and b_df, you can do

absum_df = a_df.copy()
absum_df[var_columns] = a_df[var_columns].add(b_df[var_columns])

edited May 2, 2018 at 22:44

answered May 2, 2018 at 22:15

Acccumulation

3,6311 gold badge11 silver badges13 bronze badges

4 Comments

profhoff Over a year ago

so this is something always driving me crazy in python - suppose the var columns have names var1, var2, var3, ...,varn. HOW do you tell python this in a shorthand without typing all of them one by one? in the example above with "var_columns" is there a way to use some kind of shorthand instead of typing each column? sometimes there are HUNDREDS of cols labeled sequentially and it seems so inefficient to brute force type them!

profhoff Over a year ago

In the example from @root, I was able to use the like="var" and it perfectly summed all the corresponding "var" columns. But in the second example, I will have to type all the col names out.

Acccumulation Over a year ago

@profhoff I forgot to put the format in my code. See my edited version.

root Over a year ago

Instead of using a list comprehension with string formatting to generate the columns, you can just do var_columns = a_df.filter(like='var').columns

Collectives™ on Stack Overflow

add selected columns from two pandas dfs

2 Answers 2

3 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related