1

I have two sets of csv data. One contains two columns (time and a boolean flag) and another data set which contains some info I have some graphing functions Id like to visually display. The data is sampled at different frequencies so the number of rows may not match for the datasets. How do I plot individual graphs for a range of data where the boolean is true?

Here is what the contact data looks like:

INDEX | TIME | CONTACT
0 | 240:18:59:31.750 | 0
1 | 240:18:59:32.000 | 0
2 | 240:18:59:32.250 | 0
........
1421 | 240:19:05:27.000 | 1
1422 | 240:19:05:27.250 | 1 

The other (Vehicle) data isnt really important but contains values like Weight, Speed (MPH), Pedal Position etc.

I have many seperate large excel files and because the shapes do not match I am unsure how to slice the data using the time flags so I made a function below to create the ranges but I am thinking this can be done in an easier manner.

Here is the working code (with output below). In short, is there an easier way to do this?

def determineContactSlices(data):
contactStart = None
contactEnd = None
slices = pd.DataFrame([])
for index, row in data.iterrows():
    if row['CONTACT'] == 1:
        # begin slice
        if contactStart is None:
            contactStart = index
            continue
        else:
            # still valid, move onto next
            continue
    elif row['CONTACT'] == 0:
        if contactStart is not None:
            contactEnd = index - 1
            # create slice and add the df to list
            slice = data[contactStart:contactEnd]
            print(slice)
            slices = slices.append(slice)
             # then reset everything
            slice = None
            contactStart = None
            contactEnd = None
            continue
        else:
            # move onto next row
            continue
return slices

Output: ([15542 rows x 2 columns])

Index Time  CONTACT
1421   240:19:05:27.000        1
1422   240:19:05:27.250        1
1423   240:19:05:27.500        1
1424   240:19:05:27.750        1
1425   240:19:05:28.000        1
1426   240:19:05:28.250        1

            ...      ...
56815  240:22:56:15.500        1
56816  240:22:56:15.750        1
56817  240:22:56:16.000        1
56818  240:22:56:16.250        1
56819  240:22:56:16.500        1

With this output I intend to loop through each time slice and display the Vehicle Data in subplots.

Any help or guidance would be much appreciated (:

UPDATE:

I believe I can just do filteredData = vehicleData[contactData['CONTACT'] == 1] but then I am faced with how to go about graphing individually when there is a disconnect. For example if there are 7 connections at various times and lengths, I woud like to have 7 individual plots to graph.

4
  • If I understand you correctly, do you want new dfs, 1 with all contact=1 and one with all contact=0? Commented Dec 19, 2018 at 0:10
  • Sorry for not being clear. I am interested in the seperate index ranges for when contact is 1. I do not want one large dataframe where contact == 1. For example if the first contact was at 12 seconds and then disconnected at 15 seconds, the nanother from 30 seconds to 45 seconds then disconnected until 50 seconds to 55 seconds. The output time slice would be something line slices = [data[12:15], data[30:45], data[50:55]]. or something like that so I can iterrate through the time slices to output 3 graphs of the other dater within this timeframe Commented Dec 19, 2018 at 14:57
  • but my question is, do these correlate to contact == 1 or no? I do not know what a disconnect looks like in your data. If a disconnect occurs where contact changes from 1 to 0 and is disconnected up until contact == 1 again this is a fairly easy solution that can be done in a line Commented Dec 19, 2018 at 15:03
  • The other (vehicle) data does not directly correlate with the contact. Your description about the contact data is absolutely correct. Commented Dec 19, 2018 at 15:09

2 Answers 2

1

I think what you are trying to do is relatively simple, although I am not sure if I understand the output that you want or what you want to do with it after you have it. For example:

contact_df = data[data['CONTACT'] == 1]
non_contact_df = data[data['CONTACT'] == 0]

If this isn't helpful, please provide some additional details as to what the output should look like and what you plan to do with it after it is created.

Sign up to request clarification or add additional context in comments.

1 Comment

This is similar to what I want to do. But instead of one large dataframe, Id like individual time slices. My other comment may have a better explanation for the expected output.
0

Old question but why not:

sliceStart_index = df[ df["date"]=="2012-12-28" ].index.tolist()[0]
sliceEnd_index = df[ df["date"]=="2013-01-10" ].index.tolist()[0]

this_is_your_slice = df.iloc[sliceStart_index  : sliceEnd_index]

first two lines actually get you a list of indexes where the condition is met, I just chose the first ones for example.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.