0

updated Sample CSV Data:

c1,c2,v1,v2,p1,p2,r1,a1,f1,f2,f3,Time_Stamp 

0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:00
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:01
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:02
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:03
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:04
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:05
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:06
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:07
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:08
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:09
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:10
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:11
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:12
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:13
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:14
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:15
415.7,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,13/06/2017 16:38:16 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:17
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:18
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:19
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:20
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:21
415.7,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,13/06/2017 16:38:22
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:23
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:24
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:25
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:26
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:27
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:28
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:29
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:30
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:31
415.7,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,13/06/2017 16:38:32
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:33
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:34
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:35
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:36

Edited- Reading of CSV, Python code:

import numpy as np
from datetime import date,time,datetime
import pandas as pd

def readcsv(x): #def function to read csv files based on input below
    Data = pd.read_csv(x, parse_dates=['Time_Stamp'], infer_datetime_format=True)
    Data['Date'] = Data.Time_Stamp.dt.date #creating Date Column in the Data Frame ( does not affect the main .csv file)
    Data['Time'] = Data.Time_Stamp.dt.time #creating Time Column in the Data Frame ( also does not affect the main .csv file)
    #print (Data) #<-- prints the output
    #print (Data['Time_Stamp'][6000:7000]) <- print from row 6000 to 7000 of the data frame (has over 150'000 rows)
    Data['Time_Stamp'] = pd.to_datetime(Data['Time_Stamp']) # Time_Stamp Data Frame
    print(Data[1:6])
    return Data

Data = readcsv('data.csv')


#Data = csv file data 

def getMask(start,end,Data):
    mask = (Data['Time_Stamp'] > start) & (Data['Time_Stamp'] <= end)
    return mask;

start = '2017-06-13 16:00:00'
end = '2017-06-13 16:40:00'
timerange = Data.loc[getMask(start, end, Data)]

pspike = timerange.loc[timerange['c1'] <= 5.0]

pspike output:

the row : for e.g. -> After printing pspike , the printed row has time value of 16:38:15 and the next printed row has time value of 16:38:17, that means it skipped a row where the time value is 16:38:16

[ e.g. below ]

13/06/2017 16:38:12
13/06/2017 16:38:13
13/06/2017 16:38:14
13/06/2017 16:38:15
13/06/2017 16:38:17
13/06/2017 16:38:18

After running the code below, it prints out the row( only the Time_Stamp value ) that was skipped, which has time value of 16:38:16, 16:38:22 and 16:38:32 where the rows with that value were skipped based on the output of pspike

for i in range(timerange.shape[0] - 1):
row1 = timerange.iloc[i]
row2 = timerange.iloc[i+1]
if (row2[-1] - row1[-1]).seconds > 1:
    print (row1[-1] + pd.Timedelta('1s'))

Output:

2017-06-13 16:38:16
2017-06-13 16:38:22
2017-06-13 16:38:32

What I'm trying to get is to print the whole row where the Time_Stamp value is 2017-06-13 16:38:16, where its the only row with c1 value that is more than 5.0, in this case (based on the sample code), its 415.7

Instead of this:

13/06/2017 16:38:16

I want to print it like this:

415.7,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,13/06/2017 16:38:16
415.7,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,13/06/2017 16:38:22
415.7,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,13/06/2017 16:38:32

After printing that row out, I have to replace the value of c1 - 415.7 directly above, with 0.0 . How do I do this?

EDIT

What to replace:

The c1 of the missing row in pspike output

3
  • Can you simplify your problem statement? What are all the values you want to replace with 0? Basically, what is the condition for replacement. Commented Aug 22, 2017 at 2:39
  • What is the column "AC_Input_Current"? I cannot see that in the data Commented Aug 22, 2017 at 3:14
  • My bad, I've edited and changed it to what it suppose to be , which is c1 Commented Aug 22, 2017 at 3:27

1 Answer 1

1

I'm a bit confused here because you could simply do:

pspike = timerange[timerange['c1'].gt(5.0)] #gr=greater than, lt=lower than

Which returns a dataframe with:

16  415.7   12.5    30.2    154.6   4675.2  1   -1  5199.4  0   50  0   2017-06-13 16:38:16

Set values of column "c1" to 0.0

pspike["c1"] = 0.0

Create a string from first row (index=0):

','.join(pspike.astype(str).values.tolist()[0])

Prints:

'0.0,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,2017-06-13 16:38:16'

Update

string = """c1,c2,v1,v2,p1,p2,r1,a1,f1,f2,f3,Time_Stamp
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:00
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:01
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:02
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:03
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:04
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:05
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:06
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:07
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:08
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:09
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:10
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:11
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:12
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:13
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:14
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:15
415.7,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,13/06/2017 16:38:16 
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:17
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:18
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:19
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:20
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:21
415.7,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,13/06/2017 16:38:22
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:23
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:24
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:25
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:26
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:27
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:28
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:29
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:30
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:31
415.7,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,13/06/2017 16:38:32
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:33
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:34
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:35
0,2.3,0.6,-0.9,-0.5,1,-1,941.0,0,50,0,13/06/2017 16:38:36"""

df = pd.read_csv(io.StringIO(string)) # reads data from string above
#Use : df = pd.read_csv('filename.csv')  for csv file (handling tons of data)
df["Time_Stamp"] = pd.to_datetime(df["Time_Stamp"]) # convert to Datetime
df_filter = df[df["c1"].le(0.5)] # new df with less or equal to 0.5
where = (df_filter[df_filter["Time_Stamp"].diff().dt.total_seconds() > 1] ["Time_Stamp"] - pd.Timedelta("1s")).astype(str).tolist() # Find where diff > 1 second
df_filter2 = df[df["Time_Stamp"].isin(where)] # Create new df with those
df_filter2["c1"] = 0.0 # Set c1 to 0.0

for index, row in df_filter2.iterrows():
    values = row.astype(str).tolist()
    print(','.join(values))

Return

0.0,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,2017-06-13 16:38:16
0.0,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,2017-06-13 16:38:22
0.0,12.5,30.2,154.6,4675.2,1,-1,5199.4,0,50,0,2017-06-13 16:38:32
Sign up to request clarification or add additional context in comments.

12 Comments

Based on the sample data I provided, getting only the row with value of c1 greater than 5.0 would be easier if to go with your method. But the actual data can have repeated rows with c1 values more than 5.0 I want to get the rows where c1 is lesser than 5.0 And if the printed row has skipped 1 second in the time value of the Time_Stamp, then that row that was skipped is the specific row that I want to edit the value of c1` to 0.0
@SanctaIgnis But wouldn't it make sense to handle those instead?
I tried the updated codes but it only printed the column names and not the values, any idea why?
@SanctaIgnis Actually no idea why - tried it myself and it did work for me(but those are just the initial steps). I don't feel like the code is super safe though. I'd say you would have to provide a larger data sample and explain the expected output when there are multiple values.
I edited and added little more data and the expected output
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.