How to find values appear the most

Question

I have a dataframe that has the number 6 in each row. Which will be my main number I would like to use
to find values that appear most often with the number 6 that has more than 2 of the same values. Here is my dataframe below,


import pandas as pd

df = pd.DataFrame([[1,2,4,5,6,8],
            [5,6,20,22,23,34],
            [6,12,13,34,45,46],
            [4,6,10,29,32,33],
            [1,6,13,23,33,35],
            [1,2,5,6,9,10],
            [1,2,3,5,6,8]],
            columns = ['Num1','Num2','Num3','Num4','Num5','Num6'])

I have tried this code below which is close to what I'm looking for. However, I would like to return the number 6, values, and values count.

df1 = df.stack().value_counts()
df2 = df1[df1 >= 2]

I would like my results to be like this below if possible or similar

import pandas as pd

result = pd.DataFrame([[6,1,4],
                [6,5,4],
                [6,2,3],
                [6,4,2],
                [6,8,2],
                [6,10,2],
                [6,13,2],
                [6,23,2],
                [6,33,2],
                [6,34,2]],
                columns = ['Num1','Values','Count'])

if you want extra 6 in every row then add it later. result['Num1'] = 6 — furas
– furas, Commented Aug 10 at 10:27
but first you may have to convert Series to DataFrame - df3 = pd.DataFrame(df2).reset_index() and later df3['Num1'] = 6. It may also need to rename other columns and change order of columns. So it needs few steps to create expected result. — furas
– furas, Commented Aug 10 at 10:32

mozway · Accepted Answer · 2025-08-10 19:49:59Z

You had a good start, in my opinion, the easiest is to stack+value_counts, then filter the values with boolean indexing, finally craft a new DataFrame manually and optionally sort_values:

my_value = 6

s = df.stack().value_counts(sort=False)
counts = s[s>=2].drop(my_value)

out = (pd
 .DataFrame({'Num1': my_value, 'Values': counts.index, 'Count': counts.values})
 .sort_values(by=['Count', 'Values'], ascending=[False, True], ignore_index=True)
)

Output:

   Num1  Values  Count
0     6       1      4
1     6       5      4
2     6       2      3
3     6       4      2
4     6       8      2
5     6      10      2
6     6      13      2
7     6      23      2
8     6      33      2
9     6      34      2

If you don't care about the order of the values, only of the counts, you can simplify to:

my_value = 6
s = df.stack().value_counts()
counts = s[s>=2].drop(my_value)

out = pd.DataFrame(
    {'Num1': my_value, 'Values': counts.index, 'Count': counts.values}
)

Output:

   Num1  Values  Count
0     6       1      4
1     6       5      4
2     6       2      3
3     6       4      2
4     6       8      2
5     6      23      2
6     6      13      2
7     6      34      2
8     6      33      2
9     6      10      2

Aadvik · Accepted Answer · 2025-08-11 16:44:20Z

Another solution:

import pandas as pd

df = pd.DataFrame([[1,2,4,5,6,8],
            [5,6,20,22,23,34],
            [6,12,13,34,45,46],
            [4,6,10,29,32,33],
            [1,6,13,23,33,35],
            [1,2,5,6,9,10],
            [1,2,3,5,6,8]],
            columns = ['Num1','Num2','Num3','Num4','Num5','Num6'])

df1 = df.melt()['value'].value_counts()

result = df1[df1.values >= 2][1:]

# Set final result
final_result = pd.DataFrame({"Num1": df1.index[0], "Values": result.values, "Count": result.index})

Explanation:

Use .melt to combine all df columns to one so value_counts can be applied
Use .value_counts to find out how many times the values are in your data frame, .value_counts automatically sorts the new series.
Exclude the greatest repetitive number from df1 and only include the rows where the value is greater than equal to 2
Set final result where Num1 is the first index (which is the most repetitive number) of df1 and essentially rename the index and values of result to Values and Count

And voila:

    Num1  Values  Count
1      6       1      4
2      6       5      4
3      6       2      3
4      6       4      2
5      6      10      2
6      6      13      2
7      6      34      2
8      6      23      2
9      6      33      2
10     6       8      2

furas · Accepted Answer · 2025-08-10 11:09:18Z

It needs few steps to create it

df2 is a Series and you have to convert it to DataFrame
in this DataFrame values are as index so you have to convert it to column
you have to rename columns
you have to use first value to create column Num1
you have to remove/drop first row
you have to sort by Count (descending) and Values (ascending)
finally you have to change order of columns and reset index (and drop old index)

# convert Series to DataDataFrame
result = pd.DataFrame(df2)

# convert indexes to colum
result = result.reset_index()

result = result.rename(columns={'index': 'Values', 'count': 'Count'})

# add first value as column `Num1`
result['Num1'] = result.loc[0, 'Values']

# drop first row with `[6, 6, 7]`
result = result.drop(0)

# sort by Count and Values
result = result.sort_values(by=["Count", "Values"], ascending=[False, True])

# change order of columns and reset index
result = result[['Num1', 'Values', 'Count']].reset_index(drop=True)

print(result)

Some commands you can group in one line.
You may also execute some commands in different order.
And some comands can use inplace=True without assigning again to result

Full working code:

import pandas as pd

df = pd.DataFrame([[1,2,4,5,6,8],
            [5,6,20,22,23,34],
            [6,12,13,34,45,46],
            [4,6,10,29,32,33],
            [1,6,13,23,33,35],
            [1,2,5,6,9,10],
            [1,2,3,5,6,8]],
            columns = ['Num1','Num2','Num3','Num4','Num5','Num6'])

df1 = df.stack().value_counts()
print(df1)

df2 = df1[df1 >= 2]
print(df2)

# convert Series to DataDataFrame
result = pd.DataFrame(df2)
#print(result)

# convert indexes to colum
result = result.reset_index() #names='Values')
#print(result)

result = result.rename(columns={'index': 'Values', 'count': 'Count'})
#print(result)

# add first value as column `Num1`
result['Num1'] = result.loc[0, 'Values']
#print(result)

# drop first row
result = result.drop(0)
#print(result)

# sort by Count and Value
result = result.sort_values(by=["Count", "Values"], ascending=[False, True])# .reset_index(drop=True)
#print(result)

# change order or columns and reset index
result = result[['Num1', 'Values', 'Count']].reset_index(drop=True)
print(result)

Result:

   Num1  Values  Count
0     6       1      4
1     6       5      4
2     6       2      3
3     6       4      2
4     6       8      2
5     6      10      2
6     6      13      2
7     6      23      2
8     6      33      2
9     6      34      2

ouroboros1 · Accepted Answer · 2025-08-10 13:13:48Z

Here's one option:

import numpy as np
import pandas as pd

num = 6

vals, counts = np.unique(df.to_numpy().ravel(), return_counts=True)
m = (vals != num) & (counts > 1)

v = vals[m]
c = counts[m]

order = np.lexsort((v, -c))

out = pd.DataFrame({
    'Num1': num,
    'Values': v[order],
    'Count': c[order]
})

Result

out

   Num1  Values  Count
0     6       1      4
1     6       5      4
2     6       2      3
3     6       4      2
4     6       8      2
5     6      10      2
6     6      13      2
7     6      23      2
8     6      33      2
9     6      34      2

# Equality check with OP's `result`
out.equals(result)
# True

Explanation

Use np.unique with return_counts=True on df.to_numpy + ndarray.ravel to get unique values and counts.
Use a mask (m) to filter out num and counts < 2 and apply to both arrays.
Sort with np.lexsort by "Count" (- for desc) and "Values".
Create out with pd.DataFrame.

If ordering isn't needed, you can skip lexsort and create out directly from v and c.

Collectives™ on Stack Overflow

How to find values appear the most

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related