3

I have 2 dimensional list/matrix which is dynamic with N rows and M columns.

Sample Matrix

The data type within a particular column is uniform, example: col1 is text, col2 is integer, column3 is float etc. The order of the columns can be different. Values for certain rows can be also missing.

Expected result should be 2 lists/arrays/dataframes, where:

  • list1 should be duplicated N times (depending the number of rows) col1_r1, col1_r2, ....., colM_row_n, with appended iterator or the number of rows
  • list2 should be transposed values of the rows (including the empty ones)

What is the best way to achieve this in Python 3.6 using native lists or/and numpy arrays or/and panda dataframes?

output_list1 = [col1_1, col1_2, col1_3, col1_4, col1_5, col2_1, col2_2, 
                col2_3, col2_4, col2_5, col3_1, col3_2, col3_3, col3_4, col3_5]

-

output_list2 = ["value-row1,col1", "", "value-row3,col1",   "value-row4,col1",  
                ",value-row5,col1", "value-row1,col2", "value-row2,col2",   "value-row3,col3",  
                0,  "value-row5, col5", "value-row1, col3", 0.0, 0.0, 0.0, "value-row5,col4"]

Thanks in advance for your help.

4
  • 1
    How matrix is stored? Is it some list of lists or something else? Commented Apr 10, 2018 at 12:26
  • It is a list of lists. Commented Apr 10, 2018 at 12:37
  • List of columns or list of rows? Commented Apr 10, 2018 at 12:39
  • It's a list of lists of columns. I'm parsing one html file which has lots of tables inside. I was able to create a numpy array of a particular list, so at row0, colM I have the list header that needs to be duplicated, while at row1+dim(array) I have the remaining data which needs to be transposed. # Getting the 6th list from my "file" list a = np.array(file[0][6]) # Get the shape print(a.shape) # (11, 6) Commented Apr 10, 2018 at 12:45

1 Answer 1

4

This should do the trick:

import numpy as np

# create the data in a nested list
data_list = [['Event', 'Waits', 'Total Wait Time (sec)', 'Wait Avg(ms)', '% DB time', 'Wait Class'], 
             ['latch free', '15,625', '311', '19.91', '29.6', 'Other'], 
             ['library cache: mutex X', '90,012', '117,8', '1.31', '11.2', 'Concurrency'], 
             ['DB CPU', '\xa0', '87,3', '\xa0', '8.3', '\xa0']]

# transform into numpy object array
data_array = np.array(data_list, dtype=object)
# construct header from first row
header = data_array[0, :]
# only use the data part of the array
data = data_array[1:, :]

list1 = []
list2 = []
for i in range(data.shape[0]):
    for j in range(data.shape[1]):
        # adjust for the 1 based index of row numbers
        # transpose header columns by switching indices i and j
        list1.append('{}_{}'.format(header[j], i+1))
        # populate flattened data list
        list2.append(data[i,j])

print(list1)
print(list2)

Output:

list1 = ['Event_1', 'Waits_1', 'Total Wait Time (sec)_1', 'Wait Avg(ms)_1', '% DB time_1', 'Wait Class_1', 'Event_2', 'Waits_2', 'Total Wait Time (sec)_2', 'Wait Avg(ms)_2', '% DB time_2', 'Wait Class_2', 'Event_3', 'Waits_3', 'Total Wait Time (sec)_3', 'Wait Avg(ms)_3', '% DB time_3', 'Wait Class_3']
list2 = ['latch free', '15,625', '311', '19.91', '29.6', 'Other', 'library cache: mutex X', '90,012', '117,8', '1.31', '11.2', 'Concurrency', 'DB CPU', '\xa0', '87,3', '\xa0', '8.3', '\xa0']
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for your help WofgangK. This is very close, to what I am looking for. Perhaps I didn't desccribe it well. data_array = [['Event', 'Waits', 'Total Wait Time (sec)', 'Wait Avg(ms)', '% DB time', 'Wait Class'], ['latch free', '15,625', '311', '19.91', '29.6', 'Other'], ['library cache: mutex X', '90,012', '117,8', '1.31', '11.2', 'Concurrency'], ['DB CPU', '\xa0', '87,3', '\xa0', '8.3', '\xa0']] data_array[0] is the header of the list that needs to be duplicated N times, in this case 3 times. The hdr name needs to be the same as values the list, just with appended value of the iterat
Just to clarify: what you want is for list1 to look like list1 = ['Event_1', 'Event_2', 'Event_3', 'Waits_1', ..., 'Wait Class_3']? Is list2 similar to what you're looking for?
The transpose should be done not on the whole data set at once, but on a column-per-column basis. Example: list1 = 'Event_1', 'Waits_1', 'Total Wait Time (sec)_1', 'Wait Avg(ms)_1', '% DB time_1', 'Wait Class_1', 'Event_2', 'Waits_2', 'Total Wait Time (sec)_2', 'Wait Avg(ms)_2', '% DB time_2', 'Wait Class_2' etc. list2 = 'latch free', '15,625', '311', '19.91', '29.6', 'Other', 'library cache: mutex X', '90,012', '117,8', '1.31', '11.2', 'Concurrency'
Updated for new information. Double for loops used for clarity. If you have lots of data, numpy should be used to get rid of those to improve speed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.