Python pandas - grouping and plotting

Question

In df1 I have columns for Line, Generation, ID, and Sex.

I want to count matching occurrences in df2 of the remaining columns for each row.

The desired result would look like:

Line A, Generation 2020A, has a total of 1 row for row ['A','A','A','A'] in df2.
Line B, Generation 2020B, has a total of 2 rows for row ['A','C','T','G'] in df2.

df1

Line	ID	Sex	Generation	SNP-1	SNP-2	SNP-3	SNP-4
A	1	F	2020A	A	A	A	A
B	2	F	2020B	A	C	T	G
B	3	F	2020B	A	C	T	G

df2

SNP-1	SNP-2	SNP-3	SNP-4
A	A	A	A
A	C	T	G

Manjunath K Mayya · Accepted Answer · 2022-04-16 05:14:32Z

2

You can use merge and then do value_counts to achieve this.

import pandas as pd    
df1 = pd.DataFrame([['A','2020A',   'A',    'A',    'A',    'A'], ['B','2020B', 'A',    'C',    'T',    'G'],['B','2020B',  'A',    'C',    'T',    'G']], columns= ['Line','Generation','SNP-1',   'SNP-2',    'SNP-3',    'SNP-4'])
df2 = pd.DataFrame([['A',   'A',    'A',    'A'],['A',  'C',    'T',    'G']], columns=['SNP-1',    'SNP-2',    'SNP-3',    'SNP-4'])

df_merge = df1.merge(df2, on=['SNP-1',  'SNP-2',    'SNP-3',    'SNP-4'])
print(df_merge)

print('\n', df_merge.value_counts(['Line', 'Generation']))

Output:

  Line Generation SNP-1 SNP-2 SNP-3 SNP-4
0    A      2020A     A     A     A     A
1    B      2020B     A     C     T     G
2    B      2020B     A     C     T     G

 Line  Generation
B     2020B         2
A     2020A         1
dtype: int64

answered Apr 16, 2022 at 5:14

Manjunath K Mayya

1,1161 gold badge11 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python pandas - grouping and plotting

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related