1

Each row in this database represents 1 minute. But some minutes are missing upon pulling the data from API (You'll see 09:51:00 is missing)

  ticker        date    time         vol    vwap    open    high    low    close    lbh lah trades
0   AACG    2022-01-06  09:30:00    33042   1.8807  1.8900  1.9200  1.8700  1.9017  0.0 0.0 68
1   AACG    2022-01-06  09:31:00    5306    1.9073  1.9100  1.9200  1.8801  1.9100  0.0 0.0 27
2   AACG    2022-01-06  09:32:00    3496    1.8964  1.9100  1.9193  1.8800  1.8900  0.0 0.0 17
3   AACG    2022-01-06  09:33:00    5897    1.9377  1.8900  1.9500  1.8900  1.9500  0.0 0.0 15
4   AACG    2022-01-06  09:34:00    1983    1.9362  1.9200  1.9499  1.9200  1.9200  0.0 0.0 9
5   AACG    2022-01-06  09:35:00    10725   1.9439  1.9400  1.9600  1.9201  1.9306  0.0 0.0 87
6   AACG    2022-01-06  09:36:00    5942    1.9380  1.9307  1.9400  1.9300  1.9400  0.0 0.0 48
7   AACG    2022-01-06  09:37:00    5759    1.9428  1.9659  1.9659  1.9400  1.9500  0.0 0.0 11
8   AACG    2022-01-06  09:38:00    4855    1.9424  1.9500  1.9500  1.9401  1.9495  0.0 0.0 10
9   AACG    2022-01-06  09:39:00    6275    1.9514  1.9500  1.9700  1.9450  1.9700  0.0 0.0 14
10  AACG    2022-01-06  09:40:00    13695   2.0150  1.9799  2.0500  1.9749  2.0200  0.0 0.0 59
11  AACG    2022-01-06  09:41:00    3252    2.0209  2.0275  2.0300  2.0200  2.0200  0.0 0.0 14
12  AACG    2022-01-06  09:42:00    12082   2.0117  2.0300  2.0400  1.9800  1.9900  0.0 0.0 41
13  AACG    2022-01-06  09:43:00    5148    1.9802  1.9800  1.9999  1.9750  1.9999  0.0 0.0 11
14  AACG    2022-01-06  09:44:00    2764    1.9927  1.9901  1.9943  1.9901  1.9943  0.0 0.0 5
15  AACG    2022-01-06  09:45:00    2379    1.9576  1.9601  1.9601  1.9201  1.9201  0.0 0.0 10
16  AACG    2022-01-06  09:46:00    8762    1.9852  1.9550  1.9900  1.9550  1.9900  0.0 0.0 35
17  AACG    2022-01-06  09:47:00    1343    1.9704  1.9700  1.9738  1.9700  1.9701  0.0 0.0 5
18  AACG    2022-01-06  09:48:00    17080   1.9696  1.9700  1.9800  1.9600  1.9600  0.0 0.0 9
19  AACG    2022-01-06  09:49:00    9004    1.9600  1.9600  1.9600  1.9600  1.9600  0.0 0.0 9
20  AACG    2022-01-06  09:50:00    9224    1.9603  1.9600  1.9613  1.9600  1.9613  0.0 0.0 4
21  AACG    2022-01-06  09:52:00    16914   1.9921  1.9800  2.0400  1.9750  2.0399  0.0 0.0 67
22  AACG    2022-01-06  09:53:00    4665    1.9866  1.9900  2.0395  1.9801  1.9900  0.0 0.0 37
23  AACG    2022-01-06  09:55:00    2107    2.0049  1.9900  2.0100  1.9900  2.0099  0.0 0.0 10
24  AACG    2022-01-06  09:56:00    3003    2.0028  2.0000  2.0099  2.0000  2.0099  0.0 0.0 23
25  AACG    2022-01-06  09:57:00    8489    2.0272  2.0100  2.0400  2.0100  2.0300  0.0 0.0 34
26  AACG    2022-01-06  09:58:00    6050    2.0155  2.0300  2.0300  2.0150  2.0150  0.0 0.0 6
27  AACG    2022-01-06  09:59:00    61623   2.0449  2.0300  2.0700  2.0300  2.0699  0.0 0.0 83
28  AACG    2022-01-06  10:00:00    19699   2.0856  2.0699  2.1199  2.0600  2.1100  0.0 0.0 54

I want to insert rows with empty values that only include the missing time data as a value.

missing_data = pd.DataFrame({'ticker': ['AACG'], 'date': ['2022-01-06'], 'time': ['09:51:00'], 
                                 'vol': [0], 'vwap': [0.0], 'open': [0.0], 'high': [0.0], 'low': [0.0], 
                                 'close': [0.0], 'lbh': [0.0], 'lah': [0.0], 'trades': [0]}, index=[21])

It would look something like this:

   ticker       date    time       vol  vwap    open high   low close lbh lah trades
21  AACG    2022-01-06  09:51:00    0   0.00  0.00  0.00  0.00  0.00  0.0 0.0 0

With the help of someone, I've managed to isolate the areas that show me where the missing values are at:

time_in_minutes = pd.to_timedelta(df['time'].astype(str)).astype('timedelta64[m]')
indices_where_the_next_minute_is_missing = np.where(np.diff(time_in_minutes) != 1)[0]
out = df.loc[indices_where_the_next_minute_is_missing]

Simply adding 1 to time_in_minutes will give me the correction I need:

timeinminutesplus1 = pd.to_timedelta(out['time'].astype(str)).astype('timedelta64[m]') + 1

But how do i turn it back to a datetime.time datatype and insert it into the database?

0

3 Answers 3

2

Building off of my answer to your previous question, first expand your DataFrame to include NaN rows for missing minutes.

time = pd.to_timedelta(df['time'].astype(str)).astype('timedelta64[m]')
out = df.set_index(time).reindex(np.arange(time[0], time.iloc[len(df)-1]+1)).reset_index(drop=True)

then given your missing data DataFrame

missing_data = pd.DataFrame({'ticker': ['AACG'], 'date': ['2022-01-06'], 'time': ['09:51:00'], 
                             'vol': [0], 'vwap': [0.0], 'open': [0.0], 'high': [0.0], 'low': [0.0], 
                             'close': [0.0], 'lbh': [0.0], 'lah': [0.0], 'trades': [0]}, index=[21])

which looks like:

  ticker        date      time  vol  vwap  open  high  low  close  lbh  lah  trades
21   AACG  2022-01-06  09:51:00    0   0.0   0.0   0.0  0.0    0.0  0.0  0.0       0
        

you can update out:

out.update(missing_data)

Then out becomes:

  ticker        date        time    vol     vwap    open    high    low     close   lbh     lah     trades
0   AACG    2022-01-06  09:51:00    0.0     0.0000  0.0000  0.0000  0.0000  0.0000  0.0     0.0     0.0
1   AACG    2022-01-06  09:31:00    5306.0  1.9073  1.9100  1.9200  1.8801  1.9100  0.0     0.0     27.0
2   AACG    2022-01-06  09:32:00    3496.0  1.8964  1.9100  1.9193  1.8800  1.8900  0.0     0.0     17.0
3   AACG    2022-01-06  09:33:00    5897.0  1.9377  1.8900  1.9500  1.8900  1.9500  0.0     0.0     15.0
...
20  AACG    2022-01-06  09:50:00    9224.0  1.9603  1.9600  1.9613  1.9600  1.9613  0.0     0.0     4.0
21  AACG    2022-01-06  09:51:00    0.0     0.0000  0.0000  0.0000  0.0000  0.0000  0.0     0.0     0.0
22  AACG    2022-01-06  09:52:00    16914.0 1.9921  1.9800  2.0400  1.9750  2.0399  0.0     0.0     67.0
23  AACG    2022-01-06  09:53:00    4665.0  1.9866  1.9900  2.0395  1.9801  1.9900  0.0     0.0     37.0
Sign up to request clarification or add additional context in comments.

1 Comment

This is good, but the only problem is you won't know what time or index will be missing. That's why I tried iteration at first. So that, as it counts through the rows, it would insert the values based on the count.
1

I used the code that you provided and then iterated over the result in order to add the missing rows. The final result is then sorted again to get the order and indices correctly.

import datetime

# Reading your dataframe
df = pd.read_csv('missing_minute.csv', sep=';', index_col='index')

# Define a default row to add for missing rows
default_row = {'ticker':'AACG', 'date':'2022-01-06', 'time': '00:00:00', 'vol':0.0, 'vwap':0.0, 'open':0.0, 'high':0.0, 'low':0.0, 'close':0.0, 'lbh':0.0, 'lah':0.0, 'trades':0.0}

# Your logic to find the rows before the missing
time_in_minutes = pd.to_timedelta(df['time'].astype(str)).astype('timedelta64[m]')
indices_where_the_next_minute_is_missing = np.where(np.diff(time_in_minutes) != 1)[0]
out = df.loc[indices_where_the_next_minute_is_missing]


# Iterating over the rows
for i, e in out.iterrows():

    # Extract the time of the previous row and convert it to date
    time_of_previous_row = datetime.datetime.strptime(e['time'], '%H:%M:%S')
    
    # Add one minute for the new entry
    time_of_new_row = (time_of_previous_row + datetime.timedelta(minutes=1)).strftime("%H:%M:%S")
    
    # Set new time to the default row and append it to the dataframe
    default_row['time'] = time_of_new_row
    df = df.append(default_row, ignore_index=True)

# Sort the dataframe by the time column and reset the index
df = df.sort_values(by='time').reset_index(drop=True)
df

Output:

    ticker  date    time    vol vwap    open    high    low close   lbh lah trades
0   AACG    2022-01-06  09:30:00    33042.0 1.8807  1.89    1.92    1.87    1.9017  0.0 0.0 68.0
1   AACG    2022-01-06  09:31:00    5306.0  1.9073  1.91    1.92    1.8801  1.91    0.0 0.0 27.0
2   AACG    2022-01-06  09:32:00    3496.0  1.8964  1.91    1.9193  1.88    1.89    0.0 0.0 17.0
3   AACG    2022-01-06  09:33:00    5897.0  1.9377  1.89    1.95    1.89    1.95    0.0 0.0 15.0
4   AACG    2022-01-06  09:34:00    1983.0  1.9362  1.92    1.9499  1.92    1.92    0.0 0.0 9.0
5   AACG    2022-01-06  09:35:00    10725.0 1.9439  1.94    1.96    1.9201  1.9306  0.0 0.0 87.0
6   AACG    2022-01-06  09:36:00    5942.0  1.938   1.9307  1.94    1.93    1.94    0.0 0.0 48.0
7   AACG    2022-01-06  09:37:00    5759.0  1.9428  1.9659  1.9659  1.94    1.95    0.0 0.0 11.0
8   AACG    2022-01-06  09:38:00    4855.0  1.9424  1.95    1.95    1.9401  1.9495  0.0 0.0 10.0
9   AACG    2022-01-06  09:39:00    6275.0  1.9514  1.95    1.97    1.945   1.97    0.0 0.0 14.0
10  AACG    2022-01-06  09:40:00    13695.0 2.015   1.9799  2.05    1.9749  2.02    0.0 0.0 59.0
11  AACG    2022-01-06  09:41:00    3252.0  2.0209  2.0275  2.03    2.02    2.02    0.0 0.0 14.0
12  AACG    2022-01-06  09:42:00    12082.0 2.0117  2.03    2.04    1.98    1.99    0.0 0.0 41.0
13  AACG    2022-01-06  09:43:00    5148.0  1.9802  1.98    1.9999  1.975   1.9999  0.0 0.0 11.0
14  AACG    2022-01-06  09:44:00    2764.0  1.9927  1.9901  1.9943  1.9901  1.9943  0.0 0.0 5.0
15  AACG    2022-01-06  09:45:00    2379.0  1.9576  1.9601  1.9601  1.9201  1.9201  0.0 0.0 10.0
16  AACG    2022-01-06  09:46:00    8762.0  1.9852  1.955   1.99    1.955   1.99    0.0 0.0 35.0
17  AACG    2022-01-06  09:47:00    1343.0  1.9704  1.97    1.9738  1.97    1.9701  0.0 0.0 5.0
18  AACG    2022-01-06  09:48:00    17080.0 1.9696  1.97    1.98    1.96    1.96    0.0 0.0 9.0
19  AACG    2022-01-06  09:49:00    9004.0  1.96    1.96    1.96    1.96    1.96    0.0 0.0 9.0
20  AACG    2022-01-06  09:50:00    9224.0  1.9603  1.96    1.9613  1.96    1.9613  0.0 0.0 4.0
21  AACG    2022-01-06  09:51:00    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
22  AACG    2022-01-06  09:52:00    16914.0 1.9921  1.98    2.04    1.975   2.0399  0.0 0.0 67.0
23  AACG    2022-01-06  09:53:00    4665.0  1.9866  1.99    2.0395  1.9801  1.99    0.0 0.0 37.0
24  AACG    2022-01-06  09:54:00    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
25  AACG    2022-01-06  09:55:00    2107.0  2.0049  1.99    2.01    1.99    2.0099  0.0 0.0 10.0
26  AACG    2022-01-06  09:56:00    3003.0  2.0028  2.0 2.0099  2.0 2.0099  0.0 0.0 23.0
27  AACG    2022-01-06  09:57:00    8489.0  2.0272  2.01    2.04    2.01    2.03    0.0 0.0 34.0
28  AACG    2022-01-06  09:58:00    6050.0  2.0155  2.03    2.03    2.015   2.015   0.0 0.0 6.0
29  AACG    2022-01-06  09:59:00    61623.0 2.0449  2.03    2.07    2.03    2.0699  0.0 0.0 83.0
30  AACG    2022-01-06  10:00:00    19699.0 2.0856  2.0699  2.1199  2.06    2.11    0.0 0.0 54.0

P.S.: This only works if only one entry is missing at a time and not multiple entries in one sequence (e.g. 09:51 and 09:52 are missing). This could be added if you check how many rows in a sequence are missing.

Also, if you have data over multiple days, the code has to be adapted a little bit. First, set the date in the loop. Second, sort by date and time in the end.

2 Comments

Are you sure it only works on 1 row? I can see that it has inserted two rows in your example above.
Oh miscommunication on my side... It works for multiple rows, but not for two rows back to back
0

Similar to enke's answer.

It sounds like you may want to create a new data frame with the proper date and time index that you desire (without missing rows). Then fill in the new data frame with the data that you have.

You can do this using pandas.DataFrame.update() https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.update.html

Here is an example:

start_datetime = '2022-01-06 09:30:00'
end_datetime = '2022-01-06 10:00:00'
cols = ['ticker','date','time','vol','vwap','open','high','low','close','lbh','lah','trades']
new_df = pd.DataFrame(columns=cols,
                      index=pd.date_range(start=start_datetime,
                                          end=end_datetime, freq='min'))

new_df['date'] = [d.date() for d in new_df.index]
new_df['time'] = [d.time() for d in new_df.index]
new_df.fillna(0.0)

new_df

start_datetime = '2022-01-06 09:35:00'
end_datetime = '2022-01-06 9:40:00'
cols = ['ticker','date','time','vol','vwap','open','high','low','close','lbh','lah','trades']
other_df = pd.DataFrame(columns=cols,
                      index=pd.date_range(start=start_datetime,
                                          end=end_datetime, freq='min'))

other_df['date'] = [d.date() for d in other_df.index]
other_df['time'] = [d.time() for d in other_df.index]
other_df.fillna(3)

other_df

final_df = new_df.copy()
final_df.update(other_df)

final_df

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.