Replace a string with a shorter version of itself using pandas

Question

I have a pandas dataframe with one column of model variables and their corresponding statistics in another column. I've done some string manipulation to get a derived summary table to join the summary table from the model.
lost_cost_final_table.loc[lost_cost_final_table['variable'].str.contains('class_cc', case = False), 'variable'] = lost_cost_final_table['variable'].str[:8]

Full traceback.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-229-1dbe5bd14d4b> in <module>
----> 1 lost_cost_final_table.loc[lost_cost_final_table['variable'].str.contains('class_cc', case = False), 'variable'] = lost_cost_final_table['variable'].str[:8]
      2 #lost_cost_final_table.loc[lost_cost_final_table['variable'].str.contains('class_v_age', case = False), 'variable'] = lost_cost_final_table['variable'].str[:11]
      3 #lost_cost_final_table.loc[lost_cost_final_table['variable'].str.contains('married_age', case = False), 'variable'] = lost_cost_final_table['variable'].str[:11]
      4 #lost_cost_final_table.loc[lost_cost_final_table['variable'].str.contains('state_model', case = False), 'variable'] = lost_cost_final_table['variable'].str[:11]
      5 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in __setitem__(self, key, value)
    187             key = com._apply_if_callable(key, self.obj)
    188         indexer = self._get_setitem_indexer(key)
--> 189         self._setitem_with_indexer(indexer, value)
    190 
    191     def _validate_key(self, key, axis):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _setitem_with_indexer(self, indexer, value)
    467 
    468             if isinstance(value, ABCSeries):
--> 469                 value = self._align_series(indexer, value)
    470 
    471             info_idx = indexer[info_axis]

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _align_series(self, indexer, ser, multiindex_indexer)
    732                         return ser._values.copy()
    733 
--> 734                     return ser.reindex(new_ix)._values
    735 
    736                 # 2 dims

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in reindex(self, index, **kwargs)
   3323     @Appender(generic._shared_docs['reindex'] % _shared_doc_kwargs)
   3324     def reindex(self, index=None, **kwargs):
-> 3325         return super(Series, self).reindex(index=index, **kwargs)
   3326 
   3327     def drop(self, labels=None, axis=0, index=None, columns=None,

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in reindex(self, *args, **kwargs)
   3687         # perform the reindex on the axes
   3688         return self._reindex_axes(axes, level, limit, tolerance, method,
-> 3689                                   fill_value, copy).__finalize__(self)
   3690 
   3691     def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   3705             obj = obj._reindex_with_indexers({axis: [new_index, indexer]},
   3706                                              fill_value=fill_value,
-> 3707                                              copy=copy, allow_dups=False)
   3708 
   3709         return obj

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
   3808                                                 fill_value=fill_value,
   3809                                                 allow_dups=allow_dups,
-> 3810                                                 copy=copy)
   3811 
   3812         if copy and new_data is self._data:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy)
   4412         # some axes don't allow reindexing with dups
   4413         if not allow_dups:
-> 4414             self.axes[axis]._can_reindex(indexer)
   4415 
   4416         if axis >= self.ndim:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in _can_reindex(self, indexer)
   3574         # trying to reindex on an axis with duplicates
   3575         if not self.is_unique and len(indexer):
-> 3576             raise ValueError("cannot reindex from a duplicate axis")
   3577 
   3578     def reindex(self, target, method=None, level=None, limit=None,

ValueError: cannot reindex from a duplicate axis

However, when I replace with example, it works and the only difference is the data frame name. See below. I don't see where the difference between the two codes lines are. Any ideas?

 variable = ['class_cc-Harley', 'class_cc_Sport', 'class_cc_Other', 'unit_driver_experience']
unique_value = [1200, 1400, 700, 45]
p_value = [.0001, .0001, .0001, .049]
dic = {'variable': variable, 'unique_value':unique_value, 'p_value':p_value}
df = pd.DataFrame(dic)

df.loc[df['variable'].str.contains('class_cc', case = False), 'variable'] = df['variable'].str[:8]

Looks like the index of lost_cost_final_table may contain duplicates. What's the output of lost_cost_final_table.index.is_unique? — perl
– perl, Commented May 3, 2019 at 19:09
Hi @perl. Output of lost_cost_final_table.index.is_unique = False — Jordan
– Jordan, Commented May 3, 2019 at 19:11
OK, try resetting the index with lost_cost_final_table.reset_index(inplace=True), then run your line of code once again — perl
– perl, Commented May 3, 2019 at 19:11
Perfect. If you want to put that in an answer, I can accept it. Thank you. — Jordan
– Jordan, Commented May 3, 2019 at 19:12

perl · Accepted Answer · 2019-05-03 19:15:13Z

1

The index of lost_cost_final_table is not unique, which can be fixed by running reset_index:

lost_cost_final_table.reset_index(inplace=True)

answered May 3, 2019 at 19:15

perl

9,9811 gold badge14 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Replace a string with a shorter version of itself using pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related