21

I have multiple pandas data frame objects cost1, cost2, cost3 ....

  1. They have different column names (and number of columns) but have some in common.
  2. Number of columns is fairly large in each data frame, hence handpicking the common columns manually will be painful.

How can I append rows from all of these data frames into one single data frame while retaining elements from only the common column names?

As of now I have

frames=[cost1,cost2,cost3]

new_combined = pd.concat(frames, ignore_index=True)

This obviously contains columns which are not common across all data frames.

2 Answers 2

26

For future readers, Above functionality can be implemented by pandas itself. Pandas can concat dataframe while keeping common columns only, if you provide join='inner' argument in pd.concat. e.g.

pd.concat(frames,join='inner', ignore_index=True)
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, this was a much better answer for me, and carries the advantage of better preserving the order of the columns
10

You can find the common columns with Python's set.intersection:

common_cols = list(set.intersection(*(set(df.columns) for df in frames)))

To concatenate using only the common columns, you can use

pd.concat([df[common_cols] for df in frames], ignore_index=True)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.