5

I'm looking to select rows where state contains the word Traded and trading _book does not start with letters 'E','L','N'

Test_Data = [('originating_system_id', ['RBCL', 'RBCL', 'RBCL','RBCL']),
             ('rbc_security_type1', ['CORP', 'CORP','CORP','CORP']),
             ('state', ['Traded', 'Traded Away','Traded','Traded Away']),
             ('trading_book', ['LCAAAAA','NUBBBBB','EDFGSFG','PDFEFGR'])
             ]
dfTest_Data = pd.DataFrame.from_items(Test_Data)
display(dfTest_Data)

originating_system_id   rbc_security_type1     state        trading_book
        RBCL                   CORP            Traded          LCAAAAA
        RBCL                   CORP            Traded Away     NUBBBBB
        RBCL                   CORP            Traded          EDFGSFG
        RBCL                   CORP            Traded Away     PDFEFGR

Desired output:

originating_system_id   rbc_security_type1     state        trading_book
        RBCL                   CORP            Traded Away     PDFEFGR

I though this would do the trick:

prefixes = ['E','L','N']
df_Traded_Away_User = dfTest_Data[
                                    dfTest_Data[~dfTest_Data['trading_book'].str.startswith(tuple(prefixes))]  &
                                    (dfTest_Data['state'].str.contains('Traded')) 
                                ][['originating_system_id','rbc_security_type1','state','trading_book']]
display(df_Traded_Away_User)

but I'm getting:

ValueError: Must pass DataFrame with boolean values only

1 Answer 1

5

I suggest create each boolean mask separately for better readable code and then chain them by &:

prefixes = ['E','L','N']

m1 = ~dfTest_Data['trading_book'].str.startswith(tuple(prefixes))
m2 = dfTest_Data['state'].str.contains('Traded')

cols = ['originating_system_id','rbc_security_type1','state','trading_book']
df_Traded_Away_User = dfTest_Data.loc[m1 & m2, cols]
print (df_Traded_Away_User)
  originating_system_id rbc_security_type1        state trading_book
3                  RBCL               CORP  Traded Away      PDFEFGR
Sign up to request clarification or add additional context in comments.

2 Comments

Working. Using .loc is prefereable when filtering rows?
@PeterLucas - It depends what need. If want filter by all columns then df_Traded_Away_User = dfTest_Data[m1 & m2] is better, but if want filter by only some columns e.g. 2 columns like cols = ['originating_system_id', 'trading_book'] df_Traded_Away_User = dfTest_Data.loc[m1 & m2, cols] then is loc necessary.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.