getting a subset of arrays from a pandas data frame

Question

I have a numpy array named arr with 1154 elements in it.

array([502, 502, 503, ..., 853, 853, 853], dtype=int64)

I have a data frame called df

    team    Count
0   512     11
1   513     21
2   515     18
3   516     8
4   517     4

How do I get the subset of the data frame df that includes the values only from the array arr

for eg:

team         count
arr1_value1    45
arr1_value2    67

To make this question more clear: I have a numpy array ['45', '55', '65']

I have a data frame as follows:

team  count
34      156
45      189
53       90
65       99
23       77
55       91

I need a new data frame as follows:

team    count
 45      189
 55       91
 65       99

EdChum · Accepted Answer · 2015-03-09 08:44:23Z

5

I don't know if that is a typo or not where your array values look like strings, assuming it is not and they are in fact ints then you can filter your df by calling isin:

In [6]:

a = np.array([45, 55, 65])
df[df.team.isin(a)]
Out[6]:
   team  count
1    45    189
3    65     99
5    55     91

answered Mar 9, 2015 at 8:44

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Nras Over a year ago

Perfect, same idea but better realization as my attempt! +1

Hezi Resheff · Accepted Answer · 2015-03-09 08:11:41Z

0

You can use the DataFrame.loc method

Using your example (Notice that team is the index):

arr = np.array(['45', '55', '65'])
frame = pd.DataFrame([156, 189, 90, 99, 77, 91], index=['34', '45', '53', '65', '23', '55'])
ans = frame.loc[arr]

This sort of indexing is type sensitive, so if the frame.index is int then make sure your indexing array is also of type int, and not str like in this example.

edited Mar 9, 2015 at 8:11

answered Mar 9, 2015 at 8:05

Hezi Resheff

9777 silver badges7 bronze badges

1 Comment

Hezi Resheff Over a year ago

If arr contains extra elements not in frame.index then they will be added with NaN values which you will then need to drop from the ans table.

Nras · Accepted Answer · 2015-03-09 08:23:08Z

I am answering the question asked after "To make this question more clear". As a side note: the first 4 lines could have been provided by you, so I would not have to type them myself, which could also introduce errors/misunderstanding.

The idea is to create a Series as Index and then simply create a new dataframe based on that index. I just started with pandas, maybe this can be done more efficiently.

import numpy as np
import pandas as pd

# starting with the df and teams as string
df = pd.DataFrame(data={'team': [34, 45, 53, 65, 23, 55], 'count': [156, 189, 90, 99, 77, 91]})
teams = np.array(['45', '55', '65'])

# we want the team number as int
teams_int = [int(t) for t in teams]

# mini function to check, if the team is to be kept
def filter_teams(x):
    return True if x in teams_int else False

# create the series as index and only keep those values from our original df
index = df['team'].apply(filter_teams)
df_filtered = df[index]

It returns this dataframe:

count  team
1    189    45
3     99    65
5     91    55

Note that in this case, the df_filtered uses 1, 3, 5 as index (the indices sof the original dataframe). Your question is unclear about this, as the index is not shown to us.

Collectives™ on Stack Overflow

getting a subset of arrays from a pandas data frame

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related