4

I have a Python dictionary (say D) where every key corresponds to some predefined list. I want to create an array with two columns where the first column corresponds to the keys of the dictionary D and the second column corresponds to the sum of the elements in the corresponding lists. As an example, if,

D = {1: [5,55], 2: [25,512], 3: [2, 18]}

Then, the array that I wish to create should be,

A = array( [[1,60], [2,537], [3, 20]] )

I have given a small example here, but I would like to know of a way where the implementation is the fastest. Presently, I am using the following method:

A_List = map( lambda x: [x,sum(D[x])] , D.keys() )

I realize that the output from my method is in the form of a list. I can convert it into an array in another step, but I don't know if that will be a fast method (I presume that the use of arrays will be faster than the use of lists). I will really appreciate an answer where I can know what's the fastest way of achieving this aim.

1
  • 1
    You can shave 5% or so off by replacing lists with tuples: (x,sum(D[x])). Commented Mar 19, 2017 at 3:48

3 Answers 3

7

You can use a list comprehension to create the desired output:

>>> [(k, sum(v)) for k, v in D.items()]   # Py2 use D.iteritems()
[(1, 60), (2, 537), (3, 20)]

On my computer, this runs about 50% quicker than the map(lambda:.., D) version.
Note: On py3 map just returns a generator so you need to list(map(...)) to get the real time it takes.

Sign up to request clarification or add additional context in comments.

1 Comment

There's a definite improvement in time. I notice a 40% in time which is significant when I run the code for big datasets. Thanks! =)
2

You can try this also:

a=[]
for i in D.keys():
  a+=[[i,sum(D[i])]]

Comments

1

I hope that helps:

  1. Build an array with the values of the keys of D:

    first_column = list(D.keys())
    
  2. Build an array with the sum of values in each key:

    second_column = [sum(D[key]) for key in D.keys()]
    
  3. Build an array with shape [first_column,second_column]

    your_array = list(zip(first_column,second_column))
    

4 Comments

From the way you have defined it, ' your_array ', turns out to be a list. I hope I am not missing anything here.
Yes, that's because D elements are not (let's say) numpy arrays. We could cast numpy.array(your_array) if needed.
Just saw, np.array(your_array) gives two rows. It will be best if I could have it in two column form. ( np.array(your_array) ).reshape( len(first_column), 2) messes the format up. For the I would like, A = array( [[1,60], [2,537], [3, 20]] ).
Hmm, I can use np.swapaxes( np.array(your_array) ), 0, 1 ) to get around this. =)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.