selecting by multiple columns

Question

I was going to remove certain rows of pandas dataframes based on a combination of values into two columns.

Supposing my dataframe looks like

        date  PX_LAST  CONTRACT_VALUE GEN_TICKER
1       19860401  92.6600       231650.00      EDM87
2       19860401  92.5100       231275.00      EDU87
3       19860401  92.3700       230925.00      EDZ87
4       19860401  92.2500       230625.00      EDH88
6       19860402  92.6700       231675.00      EDM87
7       19860402  92.5200       231300.00      EDU87
8       19860402  92.3700       230925.00      EDZ87
9       19860402  92.2400       230600.00      EDH88
11      19860403  92.6200       231550.00      EDM87
12      19860403  92.4700       231175.00      EDU87
13      19860403  92.3200       230800.00      EDZ87
14      19860403  92.1900       230475.00      EDH88
16      19860404  92.6900       231725.00      EDM87
17      19860404  92.5300       231325.00      EDU87
18      19860404  92.3800       230950.00      EDZ87
         ...      ...             ...        ...
241801  20150206  99.7200       249300.00      EDH15
241841  20150209  99.7200       249300.00      EDH15
241881  20150210  99.7200       249300.00      EDH15
241921  20150211  99.7200       249300.00      EDH15
241961  20150212  99.7200       249300.00      EDH15
242001  20150213  99.7200       249300.00      EDH15
242041  20150217  99.7200       249300.00      EDH15
242081  20150218  99.7225       249306.24      EDH15
242121  20150219  99.7225       249306.24      EDH15
242161  20150220  99.7200       249300.00      EDH15
242201  20150223  99.7225       249306.24      EDH15
242241  20150224  99.7325       249331.25      EDH15
242281  20150225  99.7350       249337.50      EDH15
242321  20150226  99.7350       249337.50      EDH15
242361  20150227  99.7350       249337.50      EDH15

[193411 rows x 4 columns]

and let

i = 'EDM87'
j = 19870412

I want to exclude those rows from the dataframe which have GEN_TICKER == i and date < j

My code looks like this:

x2 = [~(xi & xj) for xi, xj in zip((fdata['GEN_TICKER'] == i).tolist(),
                                   (fdata['date'].tolist() < j).tolist())]
fdata = fdata[x2]

It does the job, but it doesn't seem very efficient. Is there a better way to do this? Alternatively, would there be any inplace way to remove the rows (so that I can avoid reassigning fdata above to the reduced dataframe)?

I tried fdata.loc[:,fdata.loc['GEN_TICKER']==i] but I get an error: KeyError: 'the label [GEN_TICKER] is not in the [index]'

I tried fdata.loc[:,(fdata.loc['GEN_TICKER']==i).tolist()] but get the same error. Why do I get this error when GEN_TICKER is a column name?

Other variants with the same error are fdata.loc[fdata.loc['GEN_TICKER']==i] and fdata.loc[fdata.loc['GEN_TICKER']==i,:]

I tried fdata[fdata['GEN_TICKER']==i & fdata['date'>j]] and I get another type of error: TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool] - but individually fdata[fdata['GEN_TICKER']==i] and fdata[fdata['date'>j]] both work.

I'm using Python 3 and Pandas 0.15.

Thanks

DSM · Accepted Answer · 2015-03-14 00:35:49Z

You were very close. Changing j a bit so that we can see the effect even though we're only looking at the first few rows:

>>> i = 'EDM87'
>>> j = 19860403
>>> df[~((df.GEN_TICKER == i) & (df.date < j))]
        date  PX_LAST  CONTRACT_VALUE GEN_TICKER
2   19860401    92.51          231275      EDU87
3   19860401    92.37          230925      EDZ87
4   19860401    92.25          230625      EDH88
7   19860402    92.52          231300      EDU87
8   19860402    92.37          230925      EDZ87
9   19860402    92.24          230600      EDH88
11  19860403    92.62          231550      EDM87
12  19860403    92.47          231175      EDU87
13  19860403    92.32          230800      EDZ87
14  19860403    92.19          230475      EDH88
16  19860404    92.69          231725      EDM87
17  19860404    92.53          231325      EDU87
18  19860404    92.38          230950      EDZ87

You basically only needed to add parentheses. (I also added the NOT operator, ~, so that we're keeping the ones which aren't removed.)

Collectives™ on Stack Overflow

selecting by multiple columns

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related