Insert a series of values into pd.dataframe randomly

Question

I have a large dataframe and what I want to do is overwrite X entries of that dataframe with a new value I set. The new entries have to be at a random position, but they have to be in order. Like I have a Column with random numbers, and want to overwrite 20 of them in a row with the new value x.

I tried df.sample(x) and then update the dataframe, but I only get individual entries. But I need the X new entries in a row (consecutively).

Somebody got a solution? I'm quite new to Python and have to get into it for my master thesis.

CLARIFICATION:

My dataframe has 5 columns with almost 60,000 rows, each row for 10 minutes of the year.

One Column is 'output' with electricity production values for that 10 minutes.
For 2 consecutive hours (120 consecutive minutes, hence 12 consecutive rows) of the year I want to lower that production to 60%. I want it to happen at a random time of the year.
Another column is 'status', with information about if the production is reduced or not.

I tried:

df_update = df.sample(12)
df_update.status = 'reduced'
df.update(df_update)
df.loc[('status) == 'reduced', ['production']] *=0.6

which does the trick for the total amount of time (12*10 minutes), but I want 120 consecutive minutes and not separated.

Can you please provide a sample input and what your expected output would be — Cr1064
– Cr1064, Commented Apr 9, 2019 at 11:43
The crucial phrase is '120 consecutive minutes' or '12 consecutive rows'. — smci
– smci, Commented Jul 29, 2020 at 6:53

Cr1064 · Accepted Answer · 2019-04-09 12:08:00Z

1

I decided to get a random value and just index the next 12 entries to be 0.6. I think this is what you want.

df = pd.DataFrame({'output':np.random.randn(20),'status':[0]*20})
idx = df.sample(1).index.values[0]
df.loc[idx:idx+11,"output"]=0.6
df.loc[idx:idx+11,"status"]=1

answered Apr 9, 2019 at 12:08

Cr1064

4396 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Cr1064 Over a year ago

No problem, if you want to do this for every year I would recommend splitting each year into separate dataframes. Otherwise you will need a new idx for each year and might risk df.loc[idx:idx+11, :] running into the next year. All be it if that's ok then you're good to go

Elias Over a year ago

Just one more thing: If I want to set those 2 hours to a specific time of the year, how would I use that the idx? I also have a column named 'time'. So can I just set: python idx = df['time']='xx'.index.values[0] python and do the same?

Cr1064 Over a year ago

Do you mean you want to pick a random index within a subset of the year?

Elias Over a year ago

No, this time nothing is random. I set one row of the status to 'reduced' by using df.loc[(df['time']=='01.01.2017 00:10'), ['status']] = 'reduced' and want to set the next 12 rows as 'reduced' too.

Cr1064 Over a year ago

I would do a similar thing to above, except find the idx for wherever you want to set. In this case it would be idx=df[df['time']=='01.01.2017 00:10']].index.values[0], then you can just use the df.loc[idx:idx+11,'status']="reduced"

Collectives™ on Stack Overflow

Insert a series of values into pd.dataframe randomly

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related