2

I have a list of 4 dataframes each containing only 1 column ('CustomerID'). I would like to merge (inner join) them within a loop.

This is what I've try for the moment:

for i in all_df:
    merged = all_df[0].merge(all_df[1], on='CustomerID')
    del df[0]

What I'm trying to do here is to merge the first dataframe (index 0) with the second (index 1), then delete the first dataframe in order that the dataframe of index 1 becomes the dataframe of index 0 and thus, I could iterate.

I know this doesn't work as what I should merge from the second iteration should be the datframe from the new variable "merged" with the daframe of index 1.

The 4 dataframes are a client database at diferent time (march 2019, april 2019, may 2019 etc.). The point is to analyse the client lifetime (how long did they stay client?, after how many days did they left? etc.)

Could you please help me with that?

6
  • what is your expected output? you want an inner or outer merge? Commented Dec 10, 2019 at 17:01
  • What do you mean by "merge" if you only have just one column per DF? Commented Dec 10, 2019 at 17:02
  • 1
    I think your logic is flawed: in fact even though you could eleminate the first element, then you merge the second with the third in the second iteration, yet you are not using the join obtained during the first time you iterate. Commented Dec 10, 2019 at 17:05
  • @AndyL. I've edited my question Commented Dec 10, 2019 at 17:08
  • I am still not sure whether merge is the right solution for your issue. The 1-column dataframes merging sounds weird to me, but I can't answer definitely without knowing you sample data and expected output. On your question, when you want to join/merge multiple dataframes, use either merge with functools.reduce or df.join. Commented Dec 10, 2019 at 17:19

2 Answers 2

2

If you want to merge multiple dataframes, you may use functools.reduce as follows

from functools import reduce
df_merge = reduce(lambda df_x, df_y: pd.merge(df_x, df_y, on='CustomerID'), all_df)
Sign up to request clarification or add additional context in comments.

Comments

0

Following your step this should accomplish what you are trying to do:

#Initialize the final dataframe
result_df = all_df[0]

# Cycle over the list, from the second dataframe onwards
for df in all_df[1:]:
    result_df = result_df.merge(df, on='CustomerID')

1 Comment

This gives me this error: "TypeError: list indices must be integers or slices, not DataFrame". Should we use something like for i in range(all_df[1:]) ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.