1

I have a dataframe with 7 columns, as follows:

Bank Name | Number | Firstname | Lastname | ID | Date1    | Date2
B1        | 1      | ABC       | EFG      | 12 | Somedate | Somedate
B2        | 2      | ABC       | EFG      | 12 | Somedate | Somedate
B1        | 1      | DEF       | EFG      | 12 | Somedate | Somedate
B3        | 3      | ABC       | GHI      | 13 | Somedate | Somedate
B4        | 4      | XYZ       | FHJ      | 13 | Somedate | Somedate
B5        | 5      | XYZ       | DFI      | 13 | Somedate | Somedate

I want to create a tuple with 4 elements for each ID, such that each tuple element represents (Bank Name, Number, Firstname, Lastname) for each ID and the values of these tuple elements is the count of the distinct elements present in that respective column for that ID. For eg: For ID = 12, the tuple should be (2, 2, 2, 1) and for ID=13, the tuple should be (3, 3, 2, 3)

I'm able to get all rows for a particular ID by doing the following:

print(df.loc[df['ID'] == '12'])

But, I do not know how to do this for each value in the ID column (much like the group by clause in SQL, and also get the count instead of the actual values in the rows.

Please help.

3 Answers 3

2

Using apply you could do

In [117]: cols = ['BankName', 'Number', 'Firstname', 'Lastname']

In [126]: df.groupby('ID')[cols].nunique().apply(tuple, axis=1)
Out[126]:
ID
12    (2, 2, 2, 1)
13    (3, 3, 2, 3)
dtype: object

or,

In [127]: df.groupby('ID').apply(lambda x: tuple(x[c].nunique() for c in cols))
Out[127]:
ID
12    (2, 2, 2, 1)
13    (3, 3, 2, 3)
dtype: object

Or, if you want as dataframe instead of tuple

In [122]: df.groupby('ID').agg({c: 'nunique' for c in cols})
Out[122]:
    Lastname  Number  Firstname  BankName
ID
12         1       2          2         2
13         3       3          2         3

or,

In [123]: df.groupby('ID')[cols].nunique()
Out[123]:
    BankName  Number  Firstname  Lastname
ID
12         2       2          2         1
13         3       3          2         3
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks a lot! Used the second one. The first one gives AttributeError: 'DataFrameGroupBy' object has no attribute 'nunique' error.
What's your pd.__version__?
Version '0.19.2'
Instead of just having the count for the 'Number' in the second element of the tuple, is it possible to have something like: Eg, for ID 12: (2, {1:2, 2:1}, 2, 1), where the keys in the inner dictionary represent the 'Number' and their value is the number of times that 'Number' comes for that particular ID.
1

Use groupby with apply and lambda function with nunique:

cols = ['Bank Name', 'Number', 'Firstname', 'Lastname']
df = df.groupby('ID')[cols].apply(lambda x: tuple(x.nunique()))
print (df)
ID
12    (2, 2, 2, 1)
13    (3, 3, 2, 3)
dtype: object

And if necessary converting to dict:

d = df.groupby('ID')[cols].apply(lambda x: tuple(x.nunique())).to_dict()
print (d)
{12: (2, 2, 2, 1), 13: (3, 3, 2, 3)}

1 Comment

After converting to dict, Instead of just having the count for the 'Number' in the second element of the tuple, is it possible to have something like: Eg, for ID 12: (2, {1:2, 2:1}, 2, 1), where the keys in the inner dictionary represent the 'Number' and their value is the number of times that 'Number' comes for that particular ID.
0

I think you need this:

group = df.groupby('ID')['Bank Name','Number','Firstname','Lastname'].nunique()
group['tuples'] = group.apply(lambda x: tuple(x), axis=1)
group.loc[:,'tuples']

The output will be:

ID
12    (2, 2, 2, 1)
13    (3, 3, 2, 3)
Name: tuples, dtype: object

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.