3

I have a pandas df with hundreds of rows that looks like that:

ID value
IDx12 6
IDx15 12

I want to replicate these rows 2 times, increment the value column for each duplication and add a column called ratio for each one of the newly created rows. Here are the values of ratio I want for the created rows:

  • original row = 0
  • first duplication = 0.25
  • second duplication = 0.5

So the output should look like this:

ID value ratio
IDx12 6 0
IDx12 7 0.25
IDx12 8 0.5
IDx15 12 0
IDx15 13 0.25
IDx15 14 0.5

I found a very dumb way to do it by duplicating the df,incrementing value manually, adding a column with the ratio and then concatenating all the dfs. But it's very unefficient. Do you have a smart way to do it? thanks for your help.

1

1 Answer 1

2

Below is a vectorised approach to the problem.

Create a dataframe with repeated rows

rdf = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)

Create a column to hold number of repeats

rdf['repeat'] = 1
rdf['repeat'] = rdf.groupby('ID').repeat.cumsum() - 1

Add the repeat to value

rdf['value'] += rdf['repeat']

Create the ratio column

rdf['ratio'] = rdf.repeat * 0.25

Voila! The output is

      ID value  repeat  ratio
0  IDx12     6       0   0.00
1  IDx12     7       1   0.25
2  IDx12     8       2   0.50
3  IDx15    12       0   0.00
4  IDx15    13       1   0.25
5  IDx15    14       2   0.50
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.