1

I'm trying to makea column of boolean values based on if one column has the word 'hazard' and does not contain the word 'roof' (thus I get all non-roof hazards).

I'm using the below code and I'm getting an error:

labels['h_count2'] = labels[(labels['Description'].str.contains('Hazard')) & (labels['Description'].str.contains('Roof'))]

This is the traceback:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2656             try:
-> 2657                 return self._engine.get_loc(key)
   2658             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'h_count2'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\internals\managers.py in set(self, item, value)
   1052         try:
-> 1053             loc = self.items.get_loc(item)
   1054         except KeyError:

C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2658             except KeyError:
-> 2659                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2660         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'h_count2'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-46-51360ea6f27f> in <module>
      1 labels['h_count'] = labels['Description'].str.contains('Roof Hazard')
      2 labels['b_count'] = labels['Description'].str.contains('Brush')
----> 3 labels['h_count2'] = labels[(labels['Description'].str.contains('Hazard')) & (labels['Description'].str.contains('Roof'))]
      4 
      5 def target(row):

C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
   3368         else:
   3369             # set column
-> 3370             self._set_item(key, value)
   3371 
   3372     def _setitem_slice(self, key, value):

C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
   3444         self._ensure_valid_index(value)
   3445         value = self._sanitize_column(key, value)
-> 3446         NDFrame._set_item(self, key, value)
   3447 
   3448         # check if we are modifying a copy

C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\generic.py in _set_item(self, key, value)
   3170 
   3171     def _set_item(self, key, value):
-> 3172         self._data.set(key, value)
   3173         self._clear_item_cache()
   3174 

C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\internals\managers.py in set(self, item, value)
   1054         except KeyError:
   1055             # This item wasn't present, just insert at end
-> 1056             self.insert(len(self.items), item, value)
   1057             return
   1058 

C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\internals\managers.py in insert(self, loc, item, value, allow_duplicates)
   1156 
   1157         block = make_block(values=value, ndim=self.ndim,
-> 1158                            placement=slice(loc, loc + 1))
   1159 
   1160         for blkno, count in _fast_count_smallints(self._blknos[loc:]):

C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\internals\blocks.py in make_block(values, placement, klass, ndim, dtype, fastpath)
   3093         values = DatetimeArray._simple_new(values, dtype=dtype)
   3094 
-> 3095     return klass(values, ndim=ndim, placement=placement)
   3096 
   3097 

C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\internals\blocks.py in __init__(self, values, placement, ndim)
   2629 
   2630         super(ObjectBlock, self).__init__(values, ndim=ndim,
-> 2631                                           placement=placement)
   2632 
   2633     @property

C:\ProgramData\Anaconda3\envs\tensorflowenvironment\lib\site-packages\pandas\core\internals\blocks.py in __init__(self, values, placement, ndim)
     85             raise ValueError(
     86                 'Wrong number of items passed {val}, placement implies '
---> 87                 '{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs)))
     88 
     89     def _check_ndim(self, values, ndim):

ValueError: Wrong number of items passed 5, placement implies 1

What am i doing wrong?

7
  • 1
    try labels['Description'].str.contains('Hazard|Roof') Commented May 31, 2019 at 14:48
  • I want it to contain hazard but not contain roof. There are values that are Roof Hazard that I want to leave as they are. Commented May 31, 2019 at 14:49
  • 1
    labels['h_count2'] = (labels['Description'].str.contains('Hazard')) & (labels['Description'].str.contains('Roof')) Commented May 31, 2019 at 14:50
  • Hi @asongtoruin. This will return a boolean of true for values equaling roof hazard. That value contains roof. I want to skip any descriptions that contain the word 'roof'. Commented May 31, 2019 at 14:55
  • 1
    Change the data type to string? That worked for me in a quick sample I made Commented May 31, 2019 at 15:03

2 Answers 2

1

labels:

   A  Description
0  1        Roof 
1  2       Hazard
2  3  Roof Hazard

labels['h_count2'] = labels.Description.str.contains('Hazard') & ~labels.Description.str.contains('Roof')

Results in

   A  Description  h_count2
0  1        Roof      False
1  2       Hazard      True
2  3  Roof Hazard     False
Sign up to request clarification or add additional context in comments.

3 Comments

Resulted in typeError: bad operand type for unary ~: 'float' I don't know why. Every column is an object type.
try checking with labels.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3922698 entries, 0 to 3922697 Data columns (total 2 columns): PictureFilename object Description object dtypes: object(2) memory usage: 59.9+ MB
1
labels = pd.DataFrame({'Description': ['Hazard Roof test', 'test', 'Hazard is not', 'test2']})

labels['h_count2'] = (labels['Description'].str.upper().str.contains('HAZARD')) & ~(labels['Description'].str.upper().str.contains('ROOF'))

    Description        h_count2
0   Hazard Roof test    False
1   test                False
2   Hazard is not       True
3   test2               False

2 Comments

Resulted in typeError: bad operand type for unary ~: 'float'
give it a try with labels as a dataframe. this should work

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.