0

I have a data array looks, for example, like:

data = [
    [1.4, 2.6, 7.3, 4.2],
    [1.1, 2.0, 6.4, 1.0],
    [5.1, 6.2, 5.3, 9.9]
]

and another array with class labels:

class_labels = ['a', 'b', 'a', 'b']

Each of the class labels corresponds to certain floats in each element of the data array (e.g., class 'a' corresponds to 1.4 and 7.3 from data[0], 1.1 and 6.4 from data[1], and 5.1 and 5.3 from data[0]).

I understand from other posts how I could go about sorting one array based on another array, but is it possible to sort the class_labels array alphabetically, also sorting the corresponding floats within each element of the data array?

It's possible I've totally gone about this wrong - if I want to be able to later access certain floats from each element (i.e., only floats corresponding to a given class label), will that be possible?

Thanks for any advice!

3 Answers 3

2

Let's begin with you data

In [1]: data = [
   ...:     [1.4, 2.6, 7.3, 4.2],
   ...:     [1.1, 2.0, 6.4, 1.0],
   ...:     [5.1, 6.2, 5.3, 9.9]
   ...: ]
In [2]: labels = ['a', 'b', 'a', 'b']

Zip it to merge two arrays into one object

In [3]: zip(labels, *data)
Out[3]: 
[('a', 1.4, 1.1, 5.1),
 ('b', 2.6, 2.0, 6.2),
 ('a', 7.3, 6.4, 5.3),
 ('b', 4.2, 1.0, 9.9)]

Now sort the result:

In [4]: sorted(zip(labels, *data))
Out[4]: 
[('a', 1.4, 1.1, 5.1),
 ('a', 7.3, 6.4, 5.3),
 ('b', 2.6, 2.0, 6.2),
 ('b', 4.2, 1.0, 9.9)]

Then unzip it back:

In [6]: zip(*sorted(zip(labels, *data)))
Out[6]: 
[('a', 'a', 'b', 'b'),
 (1.4, 7.3, 2.6, 4.2),
 (1.1, 6.4, 2.0, 1.0),
 (5.1, 5.3, 6.2, 9.9)]

And finally get the result with an ugly oneliner

In [7]: [list(x) for x in zip(*sorted(zip(labels, *data)))[1:]]
Out[7]: [[1.4, 7.3, 2.6, 4.2], [1.1, 6.4, 2.0, 1.0], [5.1, 5.3, 6.2, 9.9]]

You can split the oneliner if you wish to make the code more readable

Sign up to request clarification or add additional context in comments.

Comments

1

It is possible to sort arrays based on the sort order of another, but I would suggest a dictionary instead as it'll probably be easier to work with. Something like:

data_by_class = {label:[] for label in set(class_labels)}
for row in data:
    for idx in range(len(row)):
        data_by_class[class_labels[idx]].append(row[idx])

which would result in

{ 'a':[1.4, 7.3, 1.1, 6.4, 5.1, 5.3], 'b': [2.6, 4.2, 2.0, 1.0, 6.2, 9.9] }

Comments

1

You probably want to organise your data differently, although its a little unclear exactly what you're looking for.

If we assume that you would still like row based data, where each row consists of (possibly multiple) observations of different classes, then you could reorganise your data into a list of dictionaries

import itertools
row_dicts = [{k: [x[1] for x in v] 
              for k, v in itertools.groupby(
                  sorted(zip(class_labels, row)), key=lambda x: x[0])}
             for row in data]

Now your data appears as:

>>> row_dicts
[{'a': [1.4, 7.3], 'b': [2.6, 4.2]},
 {'a': [1.1, 6.4], 'b': [1.0, 2.0]},
 {'a': [5.1, 5.3], 'b': [6.2, 9.9]}]

And you can discover eg all observations with label a from row 1

>>> row_dicts[1]["a"]
[1.1, 6.4]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.