python - sorting floats within elements of an array based on another array

Question

I have a data array looks, for example, like:

data = [
    [1.4, 2.6, 7.3, 4.2],
    [1.1, 2.0, 6.4, 1.0],
    [5.1, 6.2, 5.3, 9.9]
]

and another array with class labels:

class_labels = ['a', 'b', 'a', 'b']

Each of the class labels corresponds to certain floats in each element of the data array (e.g., class 'a' corresponds to 1.4 and 7.3 from data[0], 1.1 and 6.4 from data[1], and 5.1 and 5.3 from data[0]).

I understand from other posts how I could go about sorting one array based on another array, but is it possible to sort the class_labels array alphabetically, also sorting the corresponding floats within each element of the data array?

It's possible I've totally gone about this wrong - if I want to be able to later access certain floats from each element (i.e., only floats corresponding to a given class label), will that be possible?

Thanks for any advice!

robyschek · Accepted Answer · 2016-10-25 03:10:08Z

Let's begin with you data

In [1]: data = [
   ...:     [1.4, 2.6, 7.3, 4.2],
   ...:     [1.1, 2.0, 6.4, 1.0],
   ...:     [5.1, 6.2, 5.3, 9.9]
   ...: ]
In [2]: labels = ['a', 'b', 'a', 'b']

Zip it to merge two arrays into one object

In [3]: zip(labels, *data)
Out[3]: 
[('a', 1.4, 1.1, 5.1),
 ('b', 2.6, 2.0, 6.2),
 ('a', 7.3, 6.4, 5.3),
 ('b', 4.2, 1.0, 9.9)]

Now sort the result:

In [4]: sorted(zip(labels, *data))
Out[4]: 
[('a', 1.4, 1.1, 5.1),
 ('a', 7.3, 6.4, 5.3),
 ('b', 2.6, 2.0, 6.2),
 ('b', 4.2, 1.0, 9.9)]

Then unzip it back:

In [6]: zip(*sorted(zip(labels, *data)))
Out[6]: 
[('a', 'a', 'b', 'b'),
 (1.4, 7.3, 2.6, 4.2),
 (1.1, 6.4, 2.0, 1.0),
 (5.1, 5.3, 6.2, 9.9)]

And finally get the result with an ugly oneliner

In [7]: [list(x) for x in zip(*sorted(zip(labels, *data)))[1:]]
Out[7]: [[1.4, 7.3, 2.6, 4.2], [1.1, 6.4, 2.0, 1.0], [5.1, 5.3, 6.2, 9.9]]

You can split the oneliner if you wish to make the code more readable

Chris Kenyon · Accepted Answer · 2016-10-25 02:59:54Z

1

It is possible to sort arrays based on the sort order of another, but I would suggest a dictionary instead as it'll probably be easier to work with. Something like:

data_by_class = {label:[] for label in set(class_labels)}
for row in data:
    for idx in range(len(row)):
        data_by_class[class_labels[idx]].append(row[idx])

which would result in

{ 'a':[1.4, 7.3, 1.1, 6.4, 5.1, 5.3], 'b': [2.6, 4.2, 2.0, 1.0, 6.2, 9.9] }

edited Oct 25, 2016 at 2:59

answered Oct 25, 2016 at 2:26

Chris Kenyon

2181 silver badge8 bronze badges

Comments

donkopotamus · Accepted Answer · 2016-10-25 02:43:57Z

You probably want to organise your data differently, although its a little unclear exactly what you're looking for.

If we assume that you would still like row based data, where each row consists of (possibly multiple) observations of different classes, then you could reorganise your data into a list of dictionaries

import itertools
row_dicts = [{k: [x[1] for x in v] 
              for k, v in itertools.groupby(
                  sorted(zip(class_labels, row)), key=lambda x: x[0])}
             for row in data]

Now your data appears as:

>>> row_dicts
[{'a': [1.4, 7.3], 'b': [2.6, 4.2]},
 {'a': [1.1, 6.4], 'b': [1.0, 2.0]},
 {'a': [5.1, 5.3], 'b': [6.2, 9.9]}]

And you can discover eg all observations with label a from row 1

>>> row_dicts[1]["a"]
[1.1, 6.4]

Collectives™ on Stack Overflow

python - sorting floats within elements of an array based on another array

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related