Transposing and multiplying lists in Python

Question

I have 2 dimensional list/matrix which is dynamic with N rows and M columns.

The data type within a particular column is uniform, example: col1 is text, col2 is integer, column3 is float etc. The order of the columns can be different. Values for certain rows can be also missing.

Expected result should be 2 lists/arrays/dataframes, where:

list1 should be duplicated N times (depending the number of rows) col1_r1, col1_r2, ....., colM_row_n, with appended iterator or the number of rows
list2 should be transposed values of the rows (including the empty ones)

What is the best way to achieve this in Python 3.6 using native lists or/and numpy arrays or/and panda dataframes?

output_list1 = [col1_1, col1_2, col1_3, col1_4, col1_5, col2_1, col2_2, 
                col2_3, col2_4, col2_5, col3_1, col3_2, col3_3, col3_4, col3_5]

-

output_list2 = ["value-row1,col1", "", "value-row3,col1",   "value-row4,col1",  
                ",value-row5,col1", "value-row1,col2", "value-row2,col2",   "value-row3,col3",  
                0,  "value-row5, col5", "value-row1, col3", 0.0, 0.0, 0.0, "value-row5,col4"]

Thanks in advance for your help.

How matrix is stored? Is it some list of lists or something else? — maQ
– maQ, Commented Apr 10, 2018 at 12:26
It's a list of lists of columns. I'm parsing one html file which has lots of tables inside. I was able to create a numpy array of a particular list, so at row0, colM I have the list header that needs to be duplicated, while at row1+dim(array) I have the remaining data which needs to be transposed. # Getting the 6th list from my "file" list a = np.array(file[0][6]) # Get the shape print(a.shape) # (11, 6) — Nakinov
– Nakinov, Commented Apr 10, 2018 at 12:45

WolfgangK · Accepted Answer · 2018-04-10 14:21:41Z

4

This should do the trick:

import numpy as np

# create the data in a nested list
data_list = [['Event', 'Waits', 'Total Wait Time (sec)', 'Wait Avg(ms)', '% DB time', 'Wait Class'], 
             ['latch free', '15,625', '311', '19.91', '29.6', 'Other'], 
             ['library cache: mutex X', '90,012', '117,8', '1.31', '11.2', 'Concurrency'], 
             ['DB CPU', '\xa0', '87,3', '\xa0', '8.3', '\xa0']]

# transform into numpy object array
data_array = np.array(data_list, dtype=object)
# construct header from first row
header = data_array[0, :]
# only use the data part of the array
data = data_array[1:, :]

list1 = []
list2 = []
for i in range(data.shape[0]):
    for j in range(data.shape[1]):
        # adjust for the 1 based index of row numbers
        # transpose header columns by switching indices i and j
        list1.append('{}_{}'.format(header[j], i+1))
        # populate flattened data list
        list2.append(data[i,j])

print(list1)
print(list2)

Output:

list1 = ['Event_1', 'Waits_1', 'Total Wait Time (sec)_1', 'Wait Avg(ms)_1', '% DB time_1', 'Wait Class_1', 'Event_2', 'Waits_2', 'Total Wait Time (sec)_2', 'Wait Avg(ms)_2', '% DB time_2', 'Wait Class_2', 'Event_3', 'Waits_3', 'Total Wait Time (sec)_3', 'Wait Avg(ms)_3', '% DB time_3', 'Wait Class_3']
list2 = ['latch free', '15,625', '311', '19.91', '29.6', 'Other', 'library cache: mutex X', '90,012', '117,8', '1.31', '11.2', 'Concurrency', 'DB CPU', '\xa0', '87,3', '\xa0', '8.3', '\xa0']

edited Apr 10, 2018 at 14:21

answered Apr 10, 2018 at 13:08

WolfgangK

98312 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Nakinov Over a year ago

Thanks for your help WofgangK. This is very close, to what I am looking for. Perhaps I didn't desccribe it well. data_array = [['Event', 'Waits', 'Total Wait Time (sec)', 'Wait Avg(ms)', '% DB time', 'Wait Class'], ['latch free', '15,625', '311', '19.91', '29.6', 'Other'], ['library cache: mutex X', '90,012', '117,8', '1.31', '11.2', 'Concurrency'], ['DB CPU', '\xa0', '87,3', '\xa0', '8.3', '\xa0']] data_array[0] is the header of the list that needs to be duplicated N times, in this case 3 times. The hdr name needs to be the same as values the list, just with appended value of the iterat

WolfgangK Over a year ago

Just to clarify: what you want is for list1 to look like list1 = ['Event_1', 'Event_2', 'Event_3', 'Waits_1', ..., 'Wait Class_3']? Is list2 similar to what you're looking for?

Nakinov Over a year ago

The transpose should be done not on the whole data set at once, but on a column-per-column basis. Example: list1 = 'Event_1', 'Waits_1', 'Total Wait Time (sec)_1', 'Wait Avg(ms)_1', '% DB time_1', 'Wait Class_1', 'Event_2', 'Waits_2', 'Total Wait Time (sec)_2', 'Wait Avg(ms)_2', '% DB time_2', 'Wait Class_2' etc. list2 = 'latch free', '15,625', '311', '19.91', '29.6', 'Other', 'library cache: mutex X', '90,012', '117,8', '1.31', '11.2', 'Concurrency'

WolfgangK Over a year ago

Updated for new information. Double for loops used for clarity. If you have lots of data, numpy should be used to get rid of those to improve speed.

Collectives™ on Stack Overflow

Transposing and multiplying lists in Python

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related