pandas.concat of multiple data frames using only common columns

Question

I have multiple pandas data frame objects cost1, cost2, cost3 ....

They have different column names (and number of columns) but have some in common.
Number of columns is fairly large in each data frame, hence handpicking the common columns manually will be painful.

How can I append rows from all of these data frames into one single data frame while retaining elements from only the common column names?

As of now I have

frames=[cost1,cost2,cost3]

new_combined = pd.concat(frames, ignore_index=True)

This obviously contains columns which are not common across all data frames.

Alok Nayak · Accepted Answer · 2018-09-06 14:12:26Z

26

For future readers, Above functionality can be implemented by pandas itself. Pandas can concat dataframe while keeping common columns only, if you provide join='inner' argument in pd.concat. e.g.

pd.concat(frames,join='inner', ignore_index=True)

answered Sep 6, 2018 at 14:12

Alok Nayak

2,57124 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ruslan Over a year ago

Thanks, this was a much better answer for me, and carries the advantage of better preserving the order of the columns

Ami Tavory · Accepted Answer · 2016-10-04 22:28:34Z

10

You can find the common columns with Python's set.intersection:

common_cols = list(set.intersection(*(set(df.columns) for df in frames)))

To concatenate using only the common columns, you can use

pd.concat([df[common_cols] for df in frames], ignore_index=True)

answered Oct 4, 2016 at 22:28

Ami Tavory

76.7k13 gold badges152 silver badges196 bronze badges

Collectives™ on Stack Overflow

pandas.concat of multiple data frames using only common columns

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related