Finding most frequent value from dataframe rows in Pandas

Question

In a data frame, I want to create another column which is outputs the most frequent value coming from different columns in a row.

A    B    C   D
foo  bar  baz foo
egg  bacon egg egg
bacon egg foo  baz

The "E" column must output frequent value from a row like

E
foo
egg

How can I do it in Python?

Mode with axis=1?

Quang Hoang
– Quang Hoang

2020-12-04 15:27:43 +00:00
Commented Dec 4, 2020 at 15:27 — Quang Hoang
– Quang Hoang, Commented Dec 4, 2020 at 15:27
Didn't understand the output.

user8376557
– user8376557

2020-12-04 15:43:17 +00:00
Commented Dec 4, 2020 at 15:43 — user8376557
– user8376557, Commented Dec 4, 2020 at 15:43

Oddaspa · Accepted Answer · 2020-12-05 17:14:32Z

3

Recreating your problem with:

df = pd.DataFrame(
    {
        'A' : ['foo', 'egg', 'bacon'], 
        'B' : ['bar', 'bacon', 'egg'],
        'C' : ['baz', 'egg', 'foo'],
        'D' : ['foo', 'egg', 'baz']
    }
)

And solving the problem with

df['E'] = df.mode(axis=1)[0]

Output:

    A      B       C       D       E
0   foo    bar     baz     foo     foo
1   egg    bacon   egg     egg     egg
2   bacon  egg     foo     baz     bacon

What happens if there is no single most frequent element?

df.mode(axis=1)
    0      1       2       3
0   foo    NaN     NaN     NaN
1   egg    NaN     NaN     NaN
2   bacon  baz     egg     foo

As you can see when there is a tie on being most frequent it returns the values in the most frequent set. If I swap the values foo for egg and baz for bacon in columns C and D, respectively, we get the following result:

    0      1
0   foo    NaN
1   egg    NaN
2   bacon  egg

As you can see, now the result set is only two elements, which means that the tie is between bacon and egg.

How do I detect ties?

Let us work with the dataset not containing the column D.

df
    A      B       C
0   foo    bar     baz
1   egg    bacon   egg
2   bacon  egg     foo

df_m = df.mode(axis=1)
df_m
    0      1    2
0   bar    baz  foo
1   egg    NaN  NaN
2   bacon  egg  foo

df['D'] = df_m[0]
    A      B       C    D
0   foo    bar     baz  bar
1   egg    bacon   egg  egg
2   bacon  egg     foo  bacon

We can utilize the notna() method which pandas provide to create a mask to check which rows are not containing a NaN value, i.e which rows are in a tie.

First, we must drop the first column which always has a value.

df_m = df_m.drop(columns=0)

Then we need to transform the dataframe using another method .T, and check for any rows not containing NaNs.

df_mask = df_m.T.notna().any()
df_mask
0    False
1    False
2     True
dtype: bool

Now we have a pandas series of booleans. We can use this mask to overwrite the column from before.

df['D'][df_mask] = df['A'][df_mask] 
    A      B       C    D
0   foo    bar     baz  foo
1   egg    bacon   egg  egg
2   bacon  egg     foo  bacon

edited Dec 5, 2020 at 17:14

answered Dec 4, 2020 at 15:49

Oddaspa

8888 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

user8376557 Over a year ago

If no frequent value found, it must return the value from the first column's row? In this case, it returns an alphabetically lower value.

Oddaspa Over a year ago

No, mode works by returning a set of values, i.e if it has a tie for most frequent it will return 2 values this can be seen in the raw output of mode (updated example). The way I implemented it is by simply taking the first element, since you have not specified otherwise.

user8376557 Over a year ago

I have used this formula. And, column A has a value "Foo", column B has a value "Bar", and column C value is "Baz". In this case, the above formula will output "Bar" but it should output "Foo" in that case. If every value is unique, the output should be the value from first column.

Oddaspa Over a year ago

From the documentation this is not an option. To achieve this you need to check for NA's in the output data frame.

Oddaspa Over a year ago

I updated my answer to fit your problem :)

Collectives™ on Stack Overflow

Finding most frequent value from dataframe rows in Pandas

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related