0

I have a dataframe that has the number 6 in each row. Which will be my main number I would like to use
to find values that appear most often with the number 6 that has more than 2 of the same values. Here is my dataframe below,


import pandas as pd

df = pd.DataFrame([[1,2,4,5,6,8],
            [5,6,20,22,23,34],
            [6,12,13,34,45,46],
            [4,6,10,29,32,33],
            [1,6,13,23,33,35],
            [1,2,5,6,9,10],
            [1,2,3,5,6,8]],
            columns = ['Num1','Num2','Num3','Num4','Num5','Num6'])

I have tried this code below which is close to what I'm looking for. However, I would like to return the number 6, values, and values count.

df1 = df.stack().value_counts()
df2 = df1[df1 >= 2]

I would like my results to be like this below if possible or similar

import pandas as pd

result = pd.DataFrame([[6,1,4],
                [6,5,4],
                [6,2,3],
                [6,4,2],
                [6,8,2],
                [6,10,2],
                [6,13,2],
                [6,23,2],
                [6,33,2],
                [6,34,2]],
                columns = ['Num1','Values','Count'])
2
  • if you want extra 6 in every row then add it later. result['Num1'] = 6 Commented Aug 10 at 10:27
  • but first you may have to convert Series to DataFrame - df3 = pd.DataFrame(df2).reset_index() and later df3['Num1'] = 6. It may also need to rename other columns and change order of columns. So it needs few steps to create expected result. Commented Aug 10 at 10:32

4 Answers 4

2

You had a good start, in my opinion, the easiest is to stack+value_counts, then filter the values with boolean indexing, finally craft a new DataFrame manually and optionally sort_values:

my_value = 6

s = df.stack().value_counts(sort=False)
counts = s[s>=2].drop(my_value)

out = (pd
 .DataFrame({'Num1': my_value, 'Values': counts.index, 'Count': counts.values})
 .sort_values(by=['Count', 'Values'], ascending=[False, True], ignore_index=True)
)

Output:

   Num1  Values  Count
0     6       1      4
1     6       5      4
2     6       2      3
3     6       4      2
4     6       8      2
5     6      10      2
6     6      13      2
7     6      23      2
8     6      33      2
9     6      34      2

If you don't care about the order of the values, only of the counts, you can simplify to:

my_value = 6
s = df.stack().value_counts()
counts = s[s>=2].drop(my_value)

out = pd.DataFrame(
    {'Num1': my_value, 'Values': counts.index, 'Count': counts.values}
)

Output:

   Num1  Values  Count
0     6       1      4
1     6       5      4
2     6       2      3
3     6       4      2
4     6       8      2
5     6      23      2
6     6      13      2
7     6      34      2
8     6      33      2
9     6      10      2
Sign up to request clarification or add additional context in comments.

Comments

1

Another solution:

import pandas as pd

df = pd.DataFrame([[1,2,4,5,6,8],
            [5,6,20,22,23,34],
            [6,12,13,34,45,46],
            [4,6,10,29,32,33],
            [1,6,13,23,33,35],
            [1,2,5,6,9,10],
            [1,2,3,5,6,8]],
            columns = ['Num1','Num2','Num3','Num4','Num5','Num6'])

df1 = df.melt()['value'].value_counts()

result = df1[df1.values >= 2][1:]

# Set final result
final_result = pd.DataFrame({"Num1": df1.index[0], "Values": result.values, "Count": result.index})

Explanation:

  1. Use .melt to combine all df columns to one so value_counts can be applied

  2. Use .value_counts to find out how many times the values are in your data frame, .value_counts automatically sorts the new series.

  3. Exclude the greatest repetitive number from df1 and only include the rows where the value is greater than equal to 2

  4. Set final result where Num1 is the first index (which is the most repetitive number) of df1 and essentially rename the index and values of result to Values and Count

And voila:

    Num1  Values  Count
1      6       1      4
2      6       5      4
3      6       2      3
4      6       4      2
5      6      10      2
6      6      13      2
7      6      34      2
8      6      23      2
9      6      33      2
10     6       8      2

Comments

0

It needs few steps to create it

  • df2 is a Series and you have to convert it to DataFrame
  • in this DataFrame values are as index so you have to convert it to column
  • you have to rename columns
  • you have to use first value to create column Num1
  • you have to remove/drop first row
  • you have to sort by Count (descending) and Values (ascending)
  • finally you have to change order of columns and reset index (and drop old index)
# convert Series to DataDataFrame
result = pd.DataFrame(df2)

# convert indexes to colum
result = result.reset_index()

result = result.rename(columns={'index': 'Values', 'count': 'Count'})

# add first value as column `Num1`
result['Num1'] = result.loc[0, 'Values']

# drop first row with `[6, 6, 7]`
result = result.drop(0)

# sort by Count and Values
result = result.sort_values(by=["Count", "Values"], ascending=[False, True])

# change order of columns and reset index
result = result[['Num1', 'Values', 'Count']].reset_index(drop=True)

print(result)

Some commands you can group in one line.
You may also execute some commands in different order.
And some comands can use inplace=True without assigning again to result


Full working code:

import pandas as pd

df = pd.DataFrame([[1,2,4,5,6,8],
            [5,6,20,22,23,34],
            [6,12,13,34,45,46],
            [4,6,10,29,32,33],
            [1,6,13,23,33,35],
            [1,2,5,6,9,10],
            [1,2,3,5,6,8]],
            columns = ['Num1','Num2','Num3','Num4','Num5','Num6'])

df1 = df.stack().value_counts()
print(df1)

df2 = df1[df1 >= 2]
print(df2)

# convert Series to DataDataFrame
result = pd.DataFrame(df2)
#print(result)

# convert indexes to colum
result = result.reset_index() #names='Values')
#print(result)

result = result.rename(columns={'index': 'Values', 'count': 'Count'})
#print(result)

# add first value as column `Num1`
result['Num1'] = result.loc[0, 'Values']
#print(result)

# drop first row
result = result.drop(0)
#print(result)

# sort by Count and Value
result = result.sort_values(by=["Count", "Values"], ascending=[False, True])# .reset_index(drop=True)
#print(result)

# change order or columns and reset index
result = result[['Num1', 'Values', 'Count']].reset_index(drop=True)
print(result)

Result:

   Num1  Values  Count
0     6       1      4
1     6       5      4
2     6       2      3
3     6       4      2
4     6       8      2
5     6      10      2
6     6      13      2
7     6      23      2
8     6      33      2
9     6      34      2

Comments

0

Here's one option:

import numpy as np
import pandas as pd

num = 6

vals, counts = np.unique(df.to_numpy().ravel(), return_counts=True)
m = (vals != num) & (counts > 1)

v = vals[m]
c = counts[m]

order = np.lexsort((v, -c))

out = pd.DataFrame({
    'Num1': num,
    'Values': v[order],
    'Count': c[order]
})

Result

out

   Num1  Values  Count
0     6       1      4
1     6       5      4
2     6       2      3
3     6       4      2
4     6       8      2
5     6      10      2
6     6      13      2
7     6      23      2
8     6      33      2
9     6      34      2
# Equality check with OP's `result`
out.equals(result)
# True

Explanation

If ordering isn't needed, you can skip lexsort and create out directly from v and c.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.