3

I have a numpy array named arr with 1154 elements in it.

array([502, 502, 503, ..., 853, 853, 853], dtype=int64)

I have a data frame called df

    team    Count
0   512     11
1   513     21
2   515     18
3   516     8
4   517     4

How do I get the subset of the data frame df that includes the values only from the array arr

for eg:

team         count
arr1_value1    45
arr1_value2    67

To make this question more clear: I have a numpy array ['45', '55', '65']

I have a data frame as follows:

team  count
34      156
45      189
53       90
65       99
23       77
55       91

I need a new data frame as follows:

team    count
 45      189
 55       91
 65       99

3 Answers 3

5

I don't know if that is a typo or not where your array values look like strings, assuming it is not and they are in fact ints then you can filter your df by calling isin:

In [6]:

a = np.array([45, 55, 65])
df[df.team.isin(a)]
Out[6]:
   team  count
1    45    189
3    65     99
5    55     91
Sign up to request clarification or add additional context in comments.

1 Comment

Perfect, same idea but better realization as my attempt! +1
0

You can use the DataFrame.loc method

Using your example (Notice that team is the index):

arr = np.array(['45', '55', '65'])
frame = pd.DataFrame([156, 189, 90, 99, 77, 91], index=['34', '45', '53', '65', '23', '55'])
ans = frame.loc[arr]

This sort of indexing is type sensitive, so if the frame.index is int then make sure your indexing array is also of type int, and not str like in this example.

1 Comment

If arr contains extra elements not in frame.index then they will be added with NaN values which you will then need to drop from the ans table.
0

I am answering the question asked after "To make this question more clear". As a side note: the first 4 lines could have been provided by you, so I would not have to type them myself, which could also introduce errors/misunderstanding.

The idea is to create a Series as Index and then simply create a new dataframe based on that index. I just started with pandas, maybe this can be done more efficiently.

import numpy as np
import pandas as pd

# starting with the df and teams as string
df = pd.DataFrame(data={'team': [34, 45, 53, 65, 23, 55], 'count': [156, 189, 90, 99, 77, 91]})
teams = np.array(['45', '55', '65'])

# we want the team number as int
teams_int = [int(t) for t in teams]

# mini function to check, if the team is to be kept
def filter_teams(x):
    return True if x in teams_int else False

# create the series as index and only keep those values from our original df
index = df['team'].apply(filter_teams)
df_filtered = df[index]

It returns this dataframe:

count  team
1    189    45
3     99    65
5     91    55

Note that in this case, the df_filtered uses 1, 3, 5 as index (the indices sof the original dataframe). Your question is unclear about this, as the index is not shown to us.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.