Making a column of boolean values based on two conditions in pandas dataframe

Question

I'm trying to makea column of boolean values based on if one column has the word 'hazard' and does not contain the word 'roof' (thus I get all non-roof hazards).

I'm using the below code and I'm getting an error:

labels['h_count2'] = labels[(labels['Description'].str.contains('Hazard')) & (labels['Description'].str.contains('Roof'))]

This is the traceback:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2656             try:
-> 2657                 return self._engine.get_loc(key)
   2658             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'h_count2'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\internals\managers.py in set(self, item, value)
   1052         try:
-> 1053             loc = self.items.get_loc(item)
   1054         except KeyError:

C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2658             except KeyError:
-> 2659                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2660         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'h_count2'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-46-51360ea6f27f> in <module>
      1 labels['h_count'] = labels['Description'].str.contains('Roof Hazard')
      2 labels['b_count'] = labels['Description'].str.contains('Brush')
----> 3 labels['h_count2'] = labels[(labels['Description'].str.contains('Hazard')) & (labels['Description'].str.contains('Roof'))]
      4 
      5 def target(row):

C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
   3368         else:
   3369             # set column
-> 3370             self._set_item(key, value)
   3371 
   3372     def _setitem_slice(self, key, value):

C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
   3444         self._ensure_valid_index(value)
   3445         value = self._sanitize_column(key, value)
-> 3446         NDFrame._set_item(self, key, value)
   3447 
   3448         # check if we are modifying a copy

C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\generic.py in _set_item(self, key, value)
   3170 
   3171     def _set_item(self, key, value):
-> 3172         self._data.set(key, value)
   3173         self._clear_item_cache()
   3174 

C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\internals\managers.py in set(self, item, value)
   1054         except KeyError:
   1055             # This item wasn't present, just insert at end
-> 1056             self.insert(len(self.items), item, value)
   1057             return
   1058 

C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\internals\managers.py in insert(self, loc, item, value, allow_duplicates)
   1156 
   1157         block = make_block(values=value, ndim=self.ndim,
-> 1158                            placement=slice(loc, loc + 1))
   1159 
   1160         for blkno, count in _fast_count_smallints(self._blknos[loc:]):

C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\internals\blocks.py in make_block(values, placement, klass, ndim, dtype, fastpath)
   3093         values = DatetimeArray._simple_new(values, dtype=dtype)
   3094 
-> 3095     return klass(values, ndim=ndim, placement=placement)
   3096 
   3097 

C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\internals\blocks.py in __init__(self, values, placement, ndim)
   2629 
   2630         super(ObjectBlock, self).__init__(values, ndim=ndim,
-> 2631                                           placement=placement)
   2632 
   2633     @property

C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\internals\blocks.py in __init__(self, values, placement, ndim)
     85             raise ValueError(
     86                 'Wrong number of items passed {val}, placement implies '
---> 87                 '{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs)))
     88 
     89     def _check_ndim(self, values, ndim):

ValueError: Wrong number of items passed 5, placement implies 1

What am i doing wrong?

I want it to contain hazard but not contain roof. There are values that are Roof Hazard that I want to leave as they are. — Jordan
– Jordan, Commented May 31, 2019 at 14:49
labels['h_count2'] = (labels['Description'].str.contains('Hazard')) & (labels['Description'].str.contains('Roof')) — asongtoruin
– asongtoruin, Commented May 31, 2019 at 14:50
Hi @asongtoruin. This will return a boolean of true for values equaling roof hazard. That value contains roof. I want to skip any descriptions that contain the word 'roof'. — Jordan
– Jordan, Commented May 31, 2019 at 14:55
Change the data type to string? That worked for me in a quick sample I made — Vink
– Vink, Commented May 31, 2019 at 15:03

Vink · Accepted Answer · 2019-05-31 15:06:11Z

1

labels:

   A  Description
0  1        Roof 
1  2       Hazard
2  3  Roof Hazard

labels['h_count2'] = labels.Description.str.contains('Hazard') & ~labels.Description.str.contains('Roof')

Results in

   A  Description  h_count2
0  1        Roof      False
1  2       Hazard      True
2  3  Roof Hazard     False

answered May 31, 2019 at 15:06

Vink

5994 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Jordan Over a year ago

Resulted in typeError: bad operand type for unary ~: 'float' I don't know why. Every column is an object type.

Vink Over a year ago

try checking with labels.info()

Jordan Over a year ago

<class 'pandas.core.frame.DataFrame'> RangeIndex: 3922698 entries, 0 to 3922697 Data columns (total 2 columns): PictureFilename    object Description        object dtypes: object(2) memory usage: 59.9+ MB

Kaies LAMIRI · Accepted Answer · 2019-05-31 15:14:28Z

1

labels = pd.DataFrame({'Description': ['Hazard Roof test', 'test', 'Hazard is not', 'test2']})

labels['h_count2'] = (labels['Description'].str.upper().str.contains('HAZARD')) & ~(labels['Description'].str.upper().str.contains('ROOF'))

    Description        h_count2
0   Hazard Roof test    False
1   test                False
2   Hazard is not       True
3   test2               False

edited May 31, 2019 at 15:14

answered May 31, 2019 at 15:05

Kaies LAMIRI

1991 silver badge8 bronze badges

2 Comments

Jordan Over a year ago

Resulted in typeError: bad operand type for unary ~: 'float'

Kaies LAMIRI Over a year ago

give it a try with labels as a dataframe. this should work

Collectives™ on Stack Overflow

Making a column of boolean values based on two conditions in pandas dataframe

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related