I was going to remove certain rows of pandas dataframes based on a combination of values into two columns.
Supposing my dataframe looks like
date PX_LAST CONTRACT_VALUE GEN_TICKER
1 19860401 92.6600 231650.00 EDM87
2 19860401 92.5100 231275.00 EDU87
3 19860401 92.3700 230925.00 EDZ87
4 19860401 92.2500 230625.00 EDH88
6 19860402 92.6700 231675.00 EDM87
7 19860402 92.5200 231300.00 EDU87
8 19860402 92.3700 230925.00 EDZ87
9 19860402 92.2400 230600.00 EDH88
11 19860403 92.6200 231550.00 EDM87
12 19860403 92.4700 231175.00 EDU87
13 19860403 92.3200 230800.00 EDZ87
14 19860403 92.1900 230475.00 EDH88
16 19860404 92.6900 231725.00 EDM87
17 19860404 92.5300 231325.00 EDU87
18 19860404 92.3800 230950.00 EDZ87
... ... ... ...
241801 20150206 99.7200 249300.00 EDH15
241841 20150209 99.7200 249300.00 EDH15
241881 20150210 99.7200 249300.00 EDH15
241921 20150211 99.7200 249300.00 EDH15
241961 20150212 99.7200 249300.00 EDH15
242001 20150213 99.7200 249300.00 EDH15
242041 20150217 99.7200 249300.00 EDH15
242081 20150218 99.7225 249306.24 EDH15
242121 20150219 99.7225 249306.24 EDH15
242161 20150220 99.7200 249300.00 EDH15
242201 20150223 99.7225 249306.24 EDH15
242241 20150224 99.7325 249331.25 EDH15
242281 20150225 99.7350 249337.50 EDH15
242321 20150226 99.7350 249337.50 EDH15
242361 20150227 99.7350 249337.50 EDH15
[193411 rows x 4 columns]
and let
i = 'EDM87'
j = 19870412
I want to exclude those rows from the dataframe which have GEN_TICKER == i and date < j
My code looks like this:
x2 = [~(xi & xj) for xi, xj in zip((fdata['GEN_TICKER'] == i).tolist(),
(fdata['date'].tolist() < j).tolist())]
fdata = fdata[x2]
It does the job, but it doesn't seem very efficient. Is there a better way to do this? Alternatively, would there be any inplace way to remove the rows (so that I can avoid reassigning fdata above to the reduced dataframe)?
I tried fdata.loc[:,fdata.loc['GEN_TICKER']==i] but I get an error: KeyError: 'the label [GEN_TICKER] is not in the [index]'
I tried fdata.loc[:,(fdata.loc['GEN_TICKER']==i).tolist()] but get the same error. Why do I get this error when GEN_TICKER is a column name?
Other variants with the same error are fdata.loc[fdata.loc['GEN_TICKER']==i] and fdata.loc[fdata.loc['GEN_TICKER']==i,:]
I tried fdata[fdata['GEN_TICKER']==i & fdata['date'>j]] and I get another type of error: TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool] - but individually fdata[fdata['GEN_TICKER']==i] and fdata[fdata['date'>j]] both work.
I'm using Python 3 and Pandas 0.15.
Thanks