1

I have a large dataframe and what I want to do is overwrite X entries of that dataframe with a new value I set. The new entries have to be at a random position, but they have to be in order. Like I have a Column with random numbers, and want to overwrite 20 of them in a row with the new value x.

I tried df.sample(x) and then update the dataframe, but I only get individual entries. But I need the X new entries in a row (consecutively).

Somebody got a solution? I'm quite new to Python and have to get into it for my master thesis.

CLARIFICATION:

My dataframe has 5 columns with almost 60,000 rows, each row for 10 minutes of the year.

  • One Column is 'output' with electricity production values for that 10 minutes.
  • For 2 consecutive hours (120 consecutive minutes, hence 12 consecutive rows) of the year I want to lower that production to 60%. I want it to happen at a random time of the year.
  • Another column is 'status', with information about if the production is reduced or not.

I tried:

df_update = df.sample(12)
df_update.status = 'reduced'
df.update(df_update)
df.loc[('status) == 'reduced', ['production']] *=0.6 

which does the trick for the total amount of time (12*10 minutes), but I want 120 consecutive minutes and not separated.

2
  • Can you please provide a sample input and what your expected output would be Commented Apr 9, 2019 at 11:43
  • The crucial phrase is '120 consecutive minutes' or '12 consecutive rows'. Commented Jul 29, 2020 at 6:53

1 Answer 1

1

I decided to get a random value and just index the next 12 entries to be 0.6. I think this is what you want.

df = pd.DataFrame({'output':np.random.randn(20),'status':[0]*20})
idx = df.sample(1).index.values[0]
df.loc[idx:idx+11,"output"]=0.6
df.loc[idx:idx+11,"status"]=1
Sign up to request clarification or add additional context in comments.

5 Comments

No problem, if you want to do this for every year I would recommend splitting each year into separate dataframes. Otherwise you will need a new idx for each year and might risk df.loc[idx:idx+11, :] running into the next year. All be it if that's ok then you're good to go
Just one more thing: If I want to set those 2 hours to a specific time of the year, how would I use that the idx? I also have a column named 'time'. So can I just set: python idx = df['time']='xx'.index.values[0] python and do the same?
Do you mean you want to pick a random index within a subset of the year?
No, this time nothing is random. I set one row of the status to 'reduced' by using df.loc[(df['time']=='01.01.2017 00:10'), ['status']] = 'reduced' and want to set the next 12 rows as 'reduced' too.
I would do a similar thing to above, except find the idx for wherever you want to set. In this case it would be idx=df[df['time']=='01.01.2017 00:10']].index.values[0], then you can just use the df.loc[idx:idx+11,'status']="reduced"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.