0

I have a dataframe read from csv file, it similar to the following:

                    LIST-1  LIST-2        LIST-3              ... LIST-N
TIME                                           
2017-06-21 00:17:00 NaN     [99.221]       [42.357, 102.665]
2017-06-21 00:18:00 NaN     [50.89]        [42.357, 43.125,...]
2017-06-21 00:19:00 NaN     [61.50, 76.1]  [70.163, 121.486] 
2017-06-21 00:20:00 [70.16] NaN            NaN
2017-06-21 00:21:00 NaN     [102.665]      [57.9, 63.66, 68.7... 

Each line represents one minute of data, the dtype of list_N column is object. I want to do the :

  1. Merge all the lists in each row into a float list and use the list as a new column--ALL_LIST;
  2. Then, merge the thirty minutes of data( (that is, thirty rows of data--ALL_LIST) ) into a new list;
  3. Finally, I want to get a dataframe like this:

    TIME                    LIST                                         
    2017-06-21 00:00:00   [99.221,42.357, 42.357, ...]
    2017-06-21 00:30:00   [52.328,42.357, 49.169, ...]
    2017-06-21 01:00:00   [61.484,42.357, 76.52, ...]
    2017-06-21 01:30:00   [76.523,42.357, 121.486, ...]
    

1 Answer 1

1

I found one solution for my question. I'll write it out and hope to see whether it can improve performance.

    all_tt_list['ALL_LIST'] = all_tt_list.apply(lambda x: ','.join(x.dropna()), axis=1)
    all_tt_list['ALL_LIST'] = all_tt_list['ALL_LIST'].astype(str).str.replace('[', '')
    all_tt_list['ALL_LIST'] = all_tt_list['ALL_LIST'].astype(str).str.replace(']', '')
    all_tt_list['ALL_LIST'] = all_tt_list['ALL_LIST'].astype(str).str.split(',')
    WAIT_TIME_INTERVAL = 30*60
    rng = pd.date_range(date, periods=(24 * 60 * 60 / WAIT_TIME_INTERVAL) + 1, freq=str(WAIT_TIME_INTERVAL) + 'S',
                    tz='Asia/Shanghai')
    for k in range(len(rng)):

        if(k == (len(rng)-1)):
            continue

        period_start = rng[k]
        period_end = rng[k+1]
        period_df = all_tt_list[all_tt_list.index > period_start]
        period_df = period_df[period_df.index < period_end]

        period_tt_list = period_df['ALL_LIST'].tolist()
        import itertools

        period_merged = list(itertools.chain.from_iterable(period_tt_list))

        period_merged_s = pd.DataFrame(period_merged, columns=['TT_NUM']).astype(float).astype(int)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.