2

Given a list of dictionaries as follows:

dict_data = [
    {'name': 'r1', 'interval': [1800.0, 1900.0], 'bool_condition': [True, False]},
    {'name': 'r2', 'interval': [1600.0, 1500.0], 'bool_condition': [False]},
    {'name': 'r3', 'interval': [1400.0, 1600.0], 'bool_condition': [True]}
]

I would like to create a record array from the dictionaries data. But when I try the following I get ValueError

import numpy as np
dt = np.dtype([
('name', np.str_, 50), ('interval', np.float64, (2,)),
    ('bool_condition', np.bool)
])
values = [tuple(val.values()) for val in dict_data]
arr = np.rec.array(values, dtype=dt)

Error: ValueError: cannot set an array element with a sequence

I would like to know how could I have a more correct dtype and then create the record array from the list of dictionaries.

2 Answers 2

4

It's very convenient to do that with pandas :

 In [247]: pd.DataFrame(dict_data)[['name','interval','bool_condition']].to_records(False)

Out[247]: 
rec.array([('r1', [1800.0, 1900.0], [True, False]),
 ('r2', [1600.0, 1500.0], [False]), ('r3', [1400.0, 1600.0], [True])], 
          dtype=[('name', 'O'), ('interval', 'O'), ('bool_condition', 'O')])

['name','interval','bool_condition'] ensure the order of the fields.

Sign up to request clarification or add additional context in comments.

4 Comments

But this has thrown out all the dtypes!
Yes. but since list and str are not aligned data, there is no performance issues here. so the dtype is IMHO not very important.
I think the fact that interval is aligned is sufficient reason
Is there a numpy implementation of this without using Pandas?
3

One problem is that the iteration of a dictionary does not preserve the order. You can see this by looking at print values[0] which gives ([1800.0, 1900.0], [True, False], 'r1') if I use your code.

Rather use

import numpy as np
dt = np.dtype([
    ('name', np.str_, 50),
    ('interval', np.float64, (2,)),
    ('bool_condition', np.bool)
])
values = [
    tuple([val['name'], val['interval'], val['bool_condition']])
    for val in dict_data
]
arr = np.rec.array(values, dtype=dt)

Another thing is that the bool_condition in your data is a list and not just a boolean. So you might want to change your dtype to:

dt = np.dtype([
    ('name', np.str_, 50),
    ('interval', np.float64, (2,)),
    ('bool_condition', list)
])

1 Comment

Note that list is just translated into np.object_ here

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.