2

I am trying myself out at spame filters. I tried several methods to label text files as spam. As a result, I have three dataframes. They basically look like this:

df_method_1 = pd.DataFrame({'file': ['A','B' ,'C'], 'spam': ['1', '0', '0']})
df_method_2 = pd.DataFrame({'file': ['A','B' ,'C'], 'spam': ['1', '1', '0']})
df_method_3 = pd.DataFrame({'file': ['A','B' ,'C'], 'spam': ['1', '1', '0']})

I am now trying to creat a dataframe showing, if a file was labled as spam and if so by which method.

In the best case, I can create a dataframe containing the following infortmation:

df_summary = pd.DataFrame({'file': ['A','B' ,'C'], 'spam': ['All methods', 'Method 2 & Method 3', 'No method']})

Obviously, I am looking for the information. No need for the actual strings.

I tried pandas.DataFrame.isin() to make it happen. But I failed. Any ideas how to do this?

1 Answer 1

1

How about merge()?

df1.merge(df2, on="file").merge(df3, on="file")
  file spam_x spam_y spam
0    A      1      1    1
1    B      0      1    1
2    C      0      0    0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.