How to append string to each subsequent row in dataframe?

Question

Let's say I have a dataframe that looks like this:

REFERENCE_CODE
dog
1
2
3
4
cat
1
2

4
5

rat

3
4
5

fish
4
5
6

Notice the spaces.. I would like to achieve a dataframe that looks like this:

REFERENCE_CODE
dog
dog_1
dog_2
dog_3
dog_4
cat
cat_1
cat_2

cat_4
cat_5

rat

rat_3
rat_4
rat_5

fish
fish_4
fish_5
fish_6

I have tried something similar to the following:

for index, row in df.iterrows():
    if isinstance(row['REFERENCE_CODE'], str):
       great! continue
    elif isinstance(row['REFERENCE_CODE'], int):
       go back up and find the last instance, concatenate
    else:
       pass

I am having trouble filling out the areas where there is pseudocode. Am I correct in my logic? Is there any easier way to go about doing this? I would ideally like to hold the integrity of the original data in terms of blank spaces, size, etc. but if not, that is ok too. I will find a workaround! Thanks.

As per Andy Hayden:

Traceback (most recent call last):
  Question number REFERENCE_CODE  ... Unnamed: 12 Unnamed: 13
  File "/Users/xxx/Projects/trend_env/src/script4.py", line 10, in <module>
0             Q1a     ladder_now  ...         NaN         NaN
1             NaN            NaN  ...         NaN         NaN
2             NaN              1  ...         NaN         NaN
    headers = (df.REFERENCE_CODE != '') & ~df.REFERENCE_CODE.str.isnumeric()
3             NaN              2  ...         NaN         NaN
  File "/Users/xxx/Projects/trend_env/lib/python3.7/site-packages/pandas/core/generic.py", line 1466, in __invert__
4             NaN              3  ...         NaN         NaN
    arr = operator.inv(com.values_from_object(self))

TypeError: bad operand type for unary ~: 'float'

  Question number REFERENCE_CODE  ... Unnamed: 12 Unnamed: 13
0             Q1a     ladder_now  ...         NaN         NaN
1             NaN            NaN  ...         NaN         NaN
2             NaN              1  ...         NaN         NaN
3             NaN              2  ...         NaN         NaN
4             NaN              3  ...         NaN         NaN

[5 rows x 14 columns]

Traceback (most recent call last):
  File "/Users/mitchell_bregman/Projects/trend_env/src/script4.py", line 14, in <module>
    headers = (df.REFERENCE_CODE != '') & ~df.REFERENCE_CODE.str.isnumeric()
  File "/Users/mitchell_bregman/Projects/trend_env/lib/python3.7/site-packages/pandas/core/generic.py", line 1466, in __invert__
    arr = operator.inv(com.values_from_object(self))
TypeError: bad operand type for unary ~: 'float'

Can you give the output of df.to_dict() for these DataFrames, it's hard to infer what they actually are — Andy Hayden
– Andy Hayden, Commented Feb 8, 2019 at 0:59
Also, does this start as a csv? That might be easier to convert than a DataFrame. — Andy Hayden
– Andy Hayden, Commented Feb 8, 2019 at 1:00
Please create some dummy data, e.g. a ten line csv we can read into a DataFrame :) — Andy Hayden
– Andy Hayden, Commented Feb 8, 2019 at 1:02

Andy Hayden · Accepted Answer · 2019-02-08 01:26:19Z

1

To get the groups you can use a mask and cumsum:

In [11]: headers = (df.REFERENCE_CODE != '') & ~df.REFERENCE_CODE.str.isnumeric()

In [12]: headers.cumsum()
Out[12]:
0     1
1     1
2     1
3     1
4     1
5     2
6     2
7     2
8     2
9     2
10    2
11    2
12    3
13    3
14    3
15    3
16    3
17    3
18    4
19    4
20    4
21    4
Name: REFERENCE_CODE, dtype: int64

Now you can use this to groupby:

In [13]: res = df.groupby(headers.cumsum())['REFERENCE_CODE'].apply(lambda x: x.iloc[0] + '_' + x)

In [14]: res
Out[14]:
0       dog_dog
1         dog_1
2         dog_2
3         dog_3
4         dog_4
5       cat_cat
6         cat_1
7         cat_2
8          cat_
9         cat_4
10        cat_5
11         cat_
12      rat_rat
13         rat_
14        rat_3
15        rat_4
16        rat_5
17         rat_
18    fish_fish
19       fish_4
20       fish_5
21       fish_6
Name: REFERENCE_CODE, dtype: object

and only use the relevant (numeric) columns:

In [15]: df.REFERENCE_CODE.update(res[df.REFERENCE_CODE.str.isnumeric()])

In [16]: df
Out[16]:
   REFERENCE_CODE
0             dog
1           dog_1
2           dog_2
3           dog_3
4           dog_4
5             cat
6           cat_1
7           cat_2
8
9           cat_4
10          cat_5
11
12            rat
13
14          rat_3
15          rat_4
16          rat_5
17
18           fish
19         fish_4
20         fish_5
21         fish_6

It might be easier to convert this on the way in... I would argue that this is a strange objective (and would be a little easier in regular python).

answered Feb 8, 2019 at 1:26

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

sgerbhctim Over a year ago

Hey, I followed yours-- edits above... is this a 3.6 v 3.7 thing?

Andy Hayden Over a year ago

@sgerbhctim I am also on 3.7, that's weird. What is the output of df.REFERENCE_CODE.str.isnumeric().dtype ?

Andy Hayden Over a year ago

@sgerbhctim you might want to first to df.REFERENCE_CODE = df.REFERENCE_CODE.fillna('')

sgerbhctim Over a year ago

Edited again above.. hahah, super sorry about this - such a weird task

Andy Hayden Over a year ago

@sgerbhctim even after the fillna? It should be bool.

|

Felipe Gonzalez · Accepted Answer · 2019-02-08 01:27:27Z

What you could do is to apply a function along that series, using a mutable variable on the function to work as a "cache". I'll asume that what you have is the following list of values:

ls = ['dog', 1, 2, 3, 4, 'cat', 1, 2, '', 4, 5,
      'rat', '', 3, 4, 5, '', 'fish', 4, 5, 6]


def append_string(x, last_string_value=['initial_string']):
    if isinstance(x, str) or x is None:
        if x:
            last_string_value[0] = x
        return x
    else:
        return last_string_value[0] + '_{}'.format(x)


print(list(map(append_string, ls)))

This will give you the result you need. If what you have is a dataframe, what you can do is to apply this function along the corresponding series, and you would get the same effect.

Collectives™ on Stack Overflow

How to append string to each subsequent row in dataframe?

2 Answers 2

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related