Grouping by each value in a column of a dataframe in python

Question

I have a dataframe with 7 columns, as follows:

Bank Name | Number | Firstname | Lastname | ID | Date1    | Date2
B1        | 1      | ABC       | EFG      | 12 | Somedate | Somedate
B2        | 2      | ABC       | EFG      | 12 | Somedate | Somedate
B1        | 1      | DEF       | EFG      | 12 | Somedate | Somedate
B3        | 3      | ABC       | GHI      | 13 | Somedate | Somedate
B4        | 4      | XYZ       | FHJ      | 13 | Somedate | Somedate
B5        | 5      | XYZ       | DFI      | 13 | Somedate | Somedate

I want to create a tuple with 4 elements for each ID, such that each tuple element represents (Bank Name, Number, Firstname, Lastname) for each ID and the values of these tuple elements is the count of the distinct elements present in that respective column for that ID. For eg: For ID = 12, the tuple should be (2, 2, 2, 1) and for ID=13, the tuple should be (3, 3, 2, 3)

I'm able to get all rows for a particular ID by doing the following:

print(df.loc[df['ID'] == '12'])

But, I do not know how to do this for each value in the ID column (much like the group by clause in SQL, and also get the count instead of the actual values in the rows.

Please help.

Zero · Accepted Answer · 2017-08-10 18:23:52Z

2

Using apply you could do

In [117]: cols = ['BankName', 'Number', 'Firstname', 'Lastname']

In [126]: df.groupby('ID')[cols].nunique().apply(tuple, axis=1)
Out[126]:
ID
12    (2, 2, 2, 1)
13    (3, 3, 2, 3)
dtype: object

or,

In [127]: df.groupby('ID').apply(lambda x: tuple(x[c].nunique() for c in cols))
Out[127]:
ID
12    (2, 2, 2, 1)
13    (3, 3, 2, 3)
dtype: object

Or, if you want as dataframe instead of tuple

In [122]: df.groupby('ID').agg({c: 'nunique' for c in cols})
Out[122]:
    Lastname  Number  Firstname  BankName
ID
12         1       2          2         2
13         3       3          2         3

or,

In [123]: df.groupby('ID')[cols].nunique()
Out[123]:
    BankName  Number  Firstname  Lastname
ID
12         2       2          2         1
13         3       3          2         3

answered Aug 10, 2017 at 18:23

Zero

77.4k22 gold badges153 silver badges153 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

akrama81 Over a year ago

Thanks a lot! Used the second one. The first one gives AttributeError: 'DataFrameGroupBy' object has no attribute 'nunique' error.

Zero Over a year ago

What's your pd.__version__?

akrama81 Over a year ago

Version '0.19.2'

akrama81 Over a year ago

Instead of just having the count for the 'Number' in the second element of the tuple, is it possible to have something like: Eg, for ID 12: (2, {1:2, 2:1}, 2, 1), where the keys in the inner dictionary represent the 'Number' and their value is the number of times that 'Number' comes for that particular ID.

jezrael · Accepted Answer · 2017-08-10 18:25:37Z

1

Use groupby with apply and lambda function with nunique:

cols = ['Bank Name', 'Number', 'Firstname', 'Lastname']
df = df.groupby('ID')[cols].apply(lambda x: tuple(x.nunique()))
print (df)
ID
12    (2, 2, 2, 1)
13    (3, 3, 2, 3)
dtype: object

And if necessary converting to dict:

d = df.groupby('ID')[cols].apply(lambda x: tuple(x.nunique())).to_dict()
print (d)
{12: (2, 2, 2, 1), 13: (3, 3, 2, 3)}

answered Aug 10, 2017 at 18:25

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

1 Comment

akrama81 Over a year ago

After converting to dict, Instead of just having the count for the 'Number' in the second element of the tuple, is it possible to have something like: Eg, for ID 12: (2, {1:2, 2:1}, 2, 1), where the keys in the inner dictionary represent the 'Number' and their value is the number of times that 'Number' comes for that particular ID.

Agnaldo Luiz Cunha · Accepted Answer · 2017-08-10 18:41:12Z

0

I think you need this:

group = df.groupby('ID')['Bank Name','Number','Firstname','Lastname'].nunique()
group['tuples'] = group.apply(lambda x: tuple(x), axis=1)
group.loc[:,'tuples']

The output will be:

ID
12    (2, 2, 2, 1)
13    (3, 3, 2, 3)
Name: tuples, dtype: object

answered Aug 10, 2017 at 18:41

Agnaldo Luiz Cunha

1301 gold badge1 silver badge10 bronze badges

Collectives™ on Stack Overflow

Grouping by each value in a column of a dataframe in python

3 Answers 3

4 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related