0

I currently have a data frame that looks like this

  Temp1       Temp2         Pattern         Errors
 307.858K    303.197K         F0's            0
 297.960K    282.329K         F1's            0
   277K       260K             CA             0
   262K       238K             C5             0
   228K       168K         DATA==ADDR         0
   192K       140K            PRBS            0
   197K       77K             F0's            0
  199.9K     77.3K            F1's            0
  199K       773K              CA             0
                               C5             0
                           DATA==ADDR         0
                              PRBS            0
                              F0's            0
                              F1's            0
                               CA             0
                               C5             0
                           DATA==ADDR         0
                              PRBS            0
                              F0's            0 
                              F1's            0
                               CA             0
                               C5             0
                           DATA==ADDR         0
                              PRBS            0
                               .              . 
                               .              .
                               .              .

Expected output table

  Temp1       Temp2         Pattern         Errors
                              F0's            0
                              F1's            0
                               CA             0
                               C5             0
                          DATA==ADDR          0
                              PRBS            0
 307.858K    303.197K         F0's            0
                              F1's            0
                               CA             0
                               C5             0
                           DATA==ADDR         0
                              PRBS            0
 297.960K    282.329K         F0's            0
                              F1's            0
                               CA             0
                               C5             0
                           DATA==ADDR         0
                              PRBS            0
   277K       260K            F0's            0 
                              F1's            0
                               CA             0
                               C5             0
                           DATA==ADDR         0
                              PRBS            0
   262K       238K             .              . 
                               .              .
                               .              .

I want to change it to where the temperature column is split up to have a value for each section. ie. the first 2 temperature values correspond to the values from the second F0's to PRBS, then the second 2 temperature values correspond to the next set of 6 patterns. I thought the best way to do this would be adding 6 blank spaces before each entry but I don't know if that is the best way to do it and if it is, I'm not really sure how to go around doing it, any help will be appreciated.

EDIT: This data frame is created by concatenating 3 different dataframes I created earlier by parsing through a log file.

results = pd.concat([tempFrame, patternFrame, errorsFrame], axis = 1, sort = False)

The tempFrame contains the first 2 columns, the patternFrame contains the Pattern column and errorsFrame contains the Errors column.

tempFrame:

 tempFrame = tempFrame.assign(newIndex = tempFrame.groupby('Extra').cumcount())
 tempFrame= tempFrame.set_index(['newIndex', 'Extra']).unstack().swaplevel(0, axis = 1).sort_index(axis = 1, level = 0)
4
  • can you show the expected output table? Commented Jun 21, 2018 at 12:48
  • @YOLO Edited the post to include the expected output. Commented Jun 21, 2018 at 12:53
  • can you provide some code to generate the data Commented Jun 21, 2018 at 13:06
  • It may be easier to fix this further upstream. How is the dataframe generated? Commented Jun 21, 2018 at 13:08

1 Answer 1

1

You can try some variation of below code to generate expected output. Given you have df as dataframe.

#fetch the initial temp1
temp1 = df['Temp1'].iloc[:df.shape[0]/6]
#OR
temp1 = df['Temp1'].iloc[:(df.shape[0]/6 - 1)]
#create an numpy array of first 6 empty strings followed by array of (temp,'','','','','')
df['Temp1'] = np.hstack([np.full(6,'',dtype='S20')]+[np.append(tmp,np.full(5,'',dtype='S20')) for tmp in temp1])
temp2 = df['Temp2'].iloc[:df.shape[0]/6]
#OR
temp2 = df['Temp2'].iloc[:(df.shape[0]/6 - 1)]
df['Temp2'] = np.hstack([np.full(6,'',dtype='S20')]+[np.append(tmp,np.full(5,'',dtype='S20')) for tmp in temp2])
Sign up to request clarification or add additional context in comments.

2 Comments

When I tried this I got a ValueError: Length of values does not match length of index. Could it be because I did this (check latest edit) to create a multiindex dataframe originally?
try to cross check dimension of generated arrays. May be try temp1 = df['Temp1'].iloc[:(df.shape[0]/6 - 1)]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.