0

Intention: To filter binary numbers based on hamming weights using pandas. Here i check number of 1s occurring in the binary and write the count to df.

Effort so far:

import pandas as pd
def ones(num):
    return bin(num).count('1')
num = list(range(1,8))
C = pd.Index(["num"])
df = pd.DataFrame(num, columns=C)
df['count'] = df.apply(lambda row : ones(row['num']), axis = 1)
print(df) 

output:

   num  count
0    1      1
1    2      1
2    3      2
3    4      1
4    5      2
5    6      2
6    7      3


Intended output:
  1 2 3
0 1 3 7
1 2 5
2 4 6

Help!

0

3 Answers 3

3

You can use pivot_table. Though you'll need to define the index as the cumcount of the grouped count column, pivot_table can't figure it out all on its own :)

(df.pivot_table(index=df.groupby('count').cumcount(), 
                columns='count', 
                values='num'))

count    1    2    3
0      1.0  3.0  7.0
1      2.0  5.0  NaN
2      4.0  6.0  NaN

You also have the parameter fill_value, though I wouldn't recommend you to use it, since you'll get mixed types. Now it looks like NumPy would be a good option from here, you can easily obtain an array from the result with new_df.to_numpy().


Also, focusing on the logic in ones, we can vectorise this with (based on this answer):

m = df.num.to_numpy().itemsize
df['count'] = (df.num.to_numpy()[:,None] & (1 << np.arange(m)) > 0).view('i1').sum(1)

Here's a check on both approaches' performance:

df_large = pd.DataFrame({'num':np.random.randint(0,10,(10_000))})

def vect(df):
    m = df.num.to_numpy().itemsize
    (df.num.to_numpy()[:,None] & (1 << np.arange(m)) > 0).view('i1').sum(1)

%timeit vect(df_large)
# 340 µs ± 5.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df_large.apply(lambda row : ones(row['num']), axis = 1)
# 103 ms ± 2.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Sign up to request clarification or add additional context in comments.

Comments

1

I suggest a different output:

df.groupby("count").agg(list)

which will give you

             num
count           
1      [1, 2, 4]
2      [3, 5, 6]
3            [7]

it's the same information in a slightly different format. In your original pivoted format, the rows are meaningless and you have an undetermined number of columns. I suggest it is more common to have an undetermined number of rows. I think you'll find this easier to work with going forward.

Or consider just creating a dictionary as a DataFrame is adding a lot of overhead here for no benefit:

df.groupby("count").agg(list).to_dict()["num"]

which gives you

{
    1: [1, 2, 4], 
    2: [3, 5, 6], 
    3: [7],
}

3 Comments

OP wants to group the numbers with the same numbers of 1s in the binary representation. I don't think a pivot is the best data structure for the output, this is an alternative they might not have thought of
A df of lists is never a good idea if can be avoided. Performance drops one large dataframes even with the simplest operations
tbh it should probably be a dictionary. A df doesn't make a lot of sense either way.
0

Here's one approach

df.groupby('count')['num'].agg(list).apply(pd.Series).T

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.