I have a DataFrame like this:
In[2]: import pandas as pd
...: flow = {
...: 'Date':['09/19','09/19','09/19','09/19','09/19','09/19','10/19','10/19','10/19','10/19','10/19','10/19','10/19'],
...: 'Time':['23:00','23:10','23:20','23:30','23:40','23:50','00:00','00:10','00:20','00:30','00:40','00:50','01:00'],
...: 'Name':['P10 ','P10 ','P10 ','P10 ','P5 ','P5 ','P5 ','P10 ','P10 ','P10 ','P6 ','P6 ','P6 '],
...: 'Data':['10000','10002','10004','10005','10007','10008','10010','10012','10013','10014','10020','10022','10023']
...: }
...: flowdata = pd.DataFrame(flow)
...: flowdata = flowdata[['Date', 'Time', 'Name', 'Data']] # To preserve the columns order
...:
In[3]: flowdata
Out[3]:
Date Time Name Data
0 09/19 23:00 P10 10000
1 09/19 23:10 P10 10002
2 09/19 23:20 P10 10004
3 09/19 23:30 P10 10005
4 09/19 23:40 P5 10007
5 09/19 23:50 P5 10008
6 10/19 00:00 P5 10010
7 10/19 00:10 P10 10012
8 10/19 00:20 P10 10013
9 10/19 00:30 P10 10014
10 10/19 00:40 P6 10020
11 10/19 00:50 P6 10022
12 10/19 01:00 P6 10023
I want to slice it into others DataFrames based in "continuous" rows with values of 'Name' Column.
I try with the following code and get this:
In[3]: flowdata[flowdata['Name'] == 'P5 ']
Out[3]:
Date Time Name Data
4 09/19 23:40 P5 10007
5 09/19 23:50 P5 10008
6 10/19 00:00 P5 10010
THE PROBLEM comes when I try to slice with the Name 'P10 ' (for this case). I got a jump in the Date and Time (from index 3 to 7).
In[4]: flowdata[flowdata['Name'] == 'P10 ']
Out[4]:
Date Time Name Data
0 09/19 23:00 P10 10000
1 09/19 23:10 P10 10002
2 09/19 23:20 P10 10004
3 09/19 23:30 P10 10005
7 10/19 00:10 P10 10012
8 10/19 00:20 P10 10013
9 10/19 00:30 P10 10014
I want to get two DataFrames based in "continuous" rows of the values of the column 'Name'. Something like this:
DataFrame 1 for First Name "P10":
Date Time Name Data
0 09/19 23:00 P10 10000
1 09/19 23:10 P10 10002
2 09/19 23:20 P10 10004
3 09/19 23:30 P10 10005
DataFrame 2 for Second Name "P10":
Date Time Name Data
7 10/19 00:10 P10 10012
8 10/19 00:20 P10 10013
9 10/19 00:30 P10 10014
I looked for a way to do this with some inbuild function or method and I didn't found a way. So I decide to iterate rows, check conditions and make a list of indexes used to slice the main DataFrame. I get this code:
In[6]: name_list_with_start_end_indexes = []
...: current_name = flowdata.iloc[0]['Name']
...: current_start_index = flowdata.index[0]
...: for i in flowdata.index:
...: next_name = flowdata.loc[i]['Name']
...: if not (current_name == next_name):
...: current_end_index = i - 1
...: name_list_with_start_end_indexes.append([current_name, current_start_index, current_end_index])
...: current_start_index = i
...: current_name = next_name
...: name_list_with_start_end_indexes.append([current_name,current_start_index, i])
...:
In[7]: name_list_with_start_end_indexes
Out[7]:
[['P10 ', 0, 3],
['P5 ', 4, 6],
['P10 ', 7, 9],
['P6 ', 10, 12]]
In[8]: name_A = name_list_with_start_end_indexes[2]
In[9]: name_A
Out[9]:
['P10 ', 7, 9]
In[10]: flowdata[name_A[1]:name_A[2]+1]
Out[10]:
Date Time Name Data
7 10/19 00:10 P10 10012
8 10/19 00:20 P10 10013
9 10/19 00:30 P10 10014
THE PROBLEM is that this code runs slowly with 13000 rows (the file with this data normally has this amount of rows and have 11 columns).
Someone know a better way to get the same results but faster
Thanks in advance.