1

In df1 I have columns for Line, Generation, ID, and Sex.

I want to count matching occurrences in df2 of the remaining columns for each row.

The desired result would look like:

  • Line A, Generation 2020A, has a total of 1 row for row ['A','A','A','A'] in df2.

  • Line B, Generation 2020B, has a total of 2 rows for row ['A','C','T','G'] in df2.

df1

Line ID Sex Generation SNP-1 SNP-2 SNP-3 SNP-4
A 1 F 2020A A A A A
B 2 F 2020B A C T G
B 3 F 2020B A C T G

df2

SNP-1 SNP-2 SNP-3 SNP-4
A A A A
A C T G

1 Answer 1

2

You can use merge and then do value_counts to achieve this.

import pandas as pd    
df1 = pd.DataFrame([['A','2020A',   'A',    'A',    'A',    'A'], ['B','2020B', 'A',    'C',    'T',    'G'],['B','2020B',  'A',    'C',    'T',    'G']], columns= ['Line','Generation','SNP-1',   'SNP-2',    'SNP-3',    'SNP-4'])
df2 = pd.DataFrame([['A',   'A',    'A',    'A'],['A',  'C',    'T',    'G']], columns=['SNP-1',    'SNP-2',    'SNP-3',    'SNP-4'])

df_merge = df1.merge(df2, on=['SNP-1',  'SNP-2',    'SNP-3',    'SNP-4'])
print(df_merge)

print('\n', df_merge.value_counts(['Line', 'Generation']))

Output:

  Line Generation SNP-1 SNP-2 SNP-3 SNP-4
0    A      2020A     A     A     A     A
1    B      2020B     A     C     T     G
2    B      2020B     A     C     T     G

 Line  Generation
B     2020B         2
A     2020A         1
dtype: int64
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.