0

Here is the sample numpy datasource

     col    row1   row2   row3  row4  columns
[[(  11.2, '689', '197', 'value_2', 0, 1)]
 [(  56.4, '689', '197', 'value_3', 0, 1)]
 [(  195.7, '689', '197', 'value_2', 0, 2)]
 [(  565.2, '689', '197', 'value_3', 0, 2)]
 [(  227.6, '689', '197', 'value_2', 0, 3)]
 [(  1347.6, '689', '197', 'value_2', 0, 3)]
 [( 613.5, '689', '196', 'value_2', 0, 1)]
 [(139. , '689', '196', 'value_3', 0, 1)]
 [( 6011. , '689', '196', 'value_2', 0, 2)]
 [(103. , '689', '196', 'value_3', 0, 2)]
 [( 6860. , '689', '196', 'value_2', 0, 3)]
 [(1302. , '689', '196', 'value_3', 0, 3)]
 [( 1787.9, '622', '197', 'value_2', 0, 1)]
 [( 632.5, '622', '197', 'value_3', 0, 1)]
 [( 178.8, '622', '197', 'value_2', 0, 2)]
 [( 6360.5, '622', '197', 'value_3', 0, 2)]
 [( 228. , '622', '196', 'value_2', 0, 1)]
 [(672. , '622', '196', 'value_3', 0, 2)]
 ]

So from this expected output should be

                                   1       2       3

row1   row2    row3        row4
689    197     value_2     0       11.2    195.7   227.6
689    197     value_3     0       56.4    565     1347
689    196     value_2     0       613.5   6011    6860
689    196     value_3     0       139     103     1302
622    197     value_2     0       1787    178     
622    197     value_3     0       632     6360

Above 1 2 3 columns are getting from one column in numpy array, that is rank

From the data given, the row1 will always be 1 but it has multiple row2, row3 and row4. For every data in row1 should find equivalent rows and populate as mentioned in the output.

I have tried the below code, but unable to get the (1, 2, 3) column values properly, as it is in different place I couldn't take and write in numpy array.

new_temp_arr = 'actual_data_given'
m = 1
row_list = ['row1', 'row2', 'row3', 'row4']
# Column list taken from the array based on rank column
column_list = [1, 2, 3]
sample_list = []

for value in new_temp_arr:
    for new_value in new_temp_arr:
        if m >= len(new_temp_arr):
            break
        new_value = new_temp_arr[m]
        # Checking all the values for the rows matches with one another
        condition = [value[row] == new_value[row] for row in row_list]
        if all(condition):
            # Looping through all the column list and getting the float value
            # I'm stuck here, how to store the values with properly matched data
            for per in column_list:
                if new_value['rank'] == [per]:
                    float_value = new_value['float_value']
                    sample_list.append(new_value)
        m += 1
4
  • The nature of 'sample numpy datasource' is unclear. Column headers aren't part of an array. The nesting of [] and () suggest it is a structured array, but you haven't provided either shape or dtype. But it could be object dtype, or simply lists of tuples. I don't think numpy will help here. For grouping operations I like to use dict, or even collections.defaultdict. Commented Dec 3, 2021 at 17:06
  • Yeah it's a structured arrays, dtype I mentioned as col rows etc.. Thanks for the input I try with defaultdict Commented Dec 4, 2021 at 4:51
  • Do you know the full set of unique row# values before hand? Commented Dec 4, 2021 at 4:55
  • Unique row, means, what are you asking about exactly? Commented Dec 5, 2021 at 8:05

2 Answers 2

1

I don't think you can do this efficiently with numpy, especially since you have duplicates in your data and a simple pivot would fail (it seems you're keeping the first value, although not entirely sure, please clarify this point).

Furthermore, it looks like your output is a dataframe, so why not use pandas directly with pivot_table and aggfunc='first'?:

a = np.array([[(  11.2, '689', '197', 'value_2', 0, 1)],
              [(  56.4, '689', '197', 'value_3', 0, 1)],
              [(  195.7, '689', '197', 'value_2', 0, 2)],
              [(  565.2, '689', '197', 'value_3', 0, 2)],
              [(  227.6, '689', '197', 'value_2', 0, 3)],
              [(  1347.6, '689', '197', 'value_2', 0, 3)],
              [( 613.5, '689', '196', 'value_2', 0, 1)],
              [(139. , '689', '196', 'value_3', 0, 1)],
              [( 6011. , '689', '196', 'value_2', 0, 2)],
              [(103. , '689', '196', 'value_3', 0, 2)],
              [( 6860. , '689', '196', 'value_2', 0, 3)],
              [(1302. , '689', '196', 'value_3', 0, 3)],
              [( 1787.9, '622', '197', 'value_2', 0, 1)],
              [( 632.5, '622', '197', 'value_3', 0, 1)],
              [( 178.8, '622', '197', 'value_2', 0, 2)],
              [( 6360.5, '622', '197', 'value_3', 0, 2)],
              [( 228. , '622', '196', 'value_2', 0, 1)],
              [(672. , '622', '196', 'value_3', 0, 2)],
             ])
cols = ['col', 'row1', 'row2', 'row3', 'row4', 'columns']
(pd.DataFrame(a[:,0,:], columns=cols)
   .pivot_table(index=['row1', 'row2', 'row3', 'row4'], columns='columns', values='col', aggfunc='first')
)

output:

columns                      1       2       3
row1 row2 row3    row4                        
622  196  value_2 0      228.0     NaN     NaN
          value_3 0        NaN   672.0     NaN
     197  value_2 0     1787.9   178.8     NaN
          value_3 0      632.5  6360.5     NaN
689  196  value_2 0      613.5  6011.0  6860.0
          value_3 0      139.0   103.0  1302.0
     197  value_2 0       11.2   195.7   227.6
          value_3 0       56.4   565.2     NaN

If the order is important you can reindex to the original order:

cols = ['col', 'row1', 'row2', 'row3', 'row4', 'columns']
df = pd.DataFrame(a[:,0,:], columns=cols)

idx = df.set_index(['row1', 'row2', 'row3', 'row4']).index
idx = idx[~idx.duplicated(keep='first')]

(df.pivot_table(index=['row1', 'row2', 'row3', 'row4'], columns='columns', values='col', aggfunc='first')
   .reindex(idx)
)

output:

columns                      1       2       3
row1 row2 row3    row4                        
689  197  value_2 0       11.2   195.7   227.6
          value_3 0       56.4   565.2     NaN
     196  value_2 0      613.5  6011.0  6860.0
          value_3 0      139.0   103.0  1302.0
622  197  value_2 0     1787.9   178.8     NaN
          value_3 0      632.5  6360.5     NaN
     196  value_2 0      228.0     NaN     NaN
          value_3 0        NaN   672.0     NaN
Sign up to request clarification or add additional context in comments.

1 Comment

I have to use only numpy, that's the requirement. Order is not an issue. Any value can come first and last that will not be the problem. I just need to extract this structure with numpy.
0
def get_list(arr, row1, row_column_values, row_list, column_list, index):
    dict_keys = {i: [] for i in column_list}
    dic = {row1: dict_keys}
    for value in arr:
        if index == len(arr):
            index = 0
        value = arr[index]
        condition = [value[row][0] == row_column_values[row] for row in row_list]
        if all(condition):
            dic[row1][int(value['rank'][0])] = value['float_value'][0]
            if index == 0:
                break
        index += 1
        
        
new_temp_arr = 'actual_data_given'
m = 1
row_list = ['row1', 'row2', 'row3', 'row4']
# Column list taken from the array based on rank column
column_list = [1, 2, 3]
out_array = np.zeros() #Numpy array with type
dic = {}
    
for value in new_temp_arr:
    row_values = {row: value[row][0] for row in row_list}
    dic = get_list(new_temp_arr, value['row1'][0], row_values, row_list, column_list, m)
    float_value = list(dic[value['row1'][0]].values())
    out_array[out_index] = tuple(list(value[row_list][0]) + float_value)

return out_array
 

The above code gets the expected result as I mentioned in the question.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.