4

I have a numpy array like this:

a = [['I05', 'U13', 4],
     ['I12', 'U13', 5],
     ['I22', 'U13', 3],
     ['I03', 'U15', 5],
     ['I14', 'U23', 5],
     ['I12', 'U23', 2],
     ['I15', 'U43', 5]]

Here we have two entries for U13 and three entries for U23. So I need to keep those arrays and remove the rest.

I want a result like this after removing:

a = [['I05', 'U13', 4],
     ['I12', 'U13', 5],
     ['I22', 'U13', 3],
     ['I14', 'U23', 5],
     ['I12', 'U23', 2]]

How to do this efficiently?

The arrays are already sorted on the second column (the 'UXX' values).

0

2 Answers 2

4

This method should achieve the desired output:

import numpy as np
from collections import Counter

a = np.array(a)
# count number of occurrences of each value in 2nd col
d = Counter(a[:,1])

# create index where counts > 1
index_keep = [i for i, j in enumerate(a[:,1]) if d[j] > 1]
>>> print(a[index_keep])
[['I05' 'U13' '4']
 ['I12' 'U13' '5']
 ['I22' 'U13' '3']
 ['I14' 'U23' '5']
 ['I12' 'U23' '2']]

Sign up to request clarification or add additional context in comments.

Comments

2

For mixed types, Pandas is a convenient option. Since your data is sorted, you only need to keep duplicates:

import pandas as pd
import numpy as np

A = np.array([('I05', 'U13', 4),
              ('I12', 'U13', 5),
              ('I22', 'U13', 3),
              ('I03', 'U15', 5),
              ('I14', 'U23', 5),
              ('I12', 'U23', 2),
              ('I15', 'U43', 5)],
            dtype='object, object, i4')

df = pd.DataFrame(A)
B = df[df.duplicated(subset=['f1'], keep=False)].values

print(B)

array([['I05', 'U13', 4],
       ['I12', 'U13', 5],
       ['I22', 'U13', 3],
       ['I14', 'U23', 5],
       ['I12', 'U23', 2]], dtype=object)

Note NumPy adds names automatically. This is a structured array, not an array of tuples:

print(A)

array([('I05', 'U13', 4), ('I12', 'U13', 5), ('I22', 'U13', 3),
       ('I03', 'U15', 5), ('I14', 'U23', 5), ('I12', 'U23', 2),
       ('I15', 'U43', 5)], 
      dtype=[('f0', 'O'), ('f1', 'O'), ('f2', '<i4')])

3 Comments

What does the 'f1' do? Can you please elaborate? Also, I have an array of arrays, not an array of tuples. Will it work there?
@PreetomSahaArko, See update, NumPy automatically adds names. These feed into Pandas.
@PreetomSahaArko, Also note that A is not an array of tuples, it's a structured array. You can easily convert to a structured array via the above logic.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.