3

I am getting list out of a nested list.

list_of_data = [{'id':99,
                 'rocketship':{'price':[10, 10, 10, 10, 10], 
                               'ytd':[1, 1, 1.05, 1.1, 1.18]}},
                {'id':898,
                 'rocketship':{'price':[10, 10, 10, 10, 10], 
                               'ytd':[1, 1, 1.05, 1.1, 1.18]}},
                {'id':903,
                 'rocketship':{'price':[20, 20, 20, 10, 10], 
                               'ytd':[1, 1, 1.05, 1.1, 1.18]}},
                {'id':999,
                 'rocketship':{'price':[20, 20, 20, 10, 10], 
                               'ytd':[1, 3, 4.05, 1.1, 1.18]}},
                ]

price, ytd = map(list, zip(*((list_of_data[i]['rocketship']['price'], list_of_data[i]['rocketship']['ytd']) for i in range(0, len(list_of_data)))))

My expected output is below (But, I am getting something different):

price = [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 10, 10, 20, 20, 20, 10, 10]

ytd = [1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 3, 4.05, 1.1, 1.18]

But, I am getting this:
price
Out[19]: 
[[10, 10, 10, 10, 10],
 [10, 10, 10, 10, 10],
 [20, 20, 20, 10, 10],
 [20, 20, 20, 10, 10]]

Expected output:

price = [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 10, 10, 20, 20, 20, 10, 10]

ytd = [1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 3, 4.05, 1.1, 1.18]
1
  • Welcome to Stack Overflow. Sorry, I can't see which is the actual output, and which is the expected output. Please edit to make sure it is clear. Commented Oct 12, 2022 at 4:25

6 Answers 6

4

try this:

update

Thanks @shawn caza performance test for 100000 loops:

shawncaza answer: 0.10945558547973633 seconds

my answer with get method : 0.1443953514099121 seconds

my answer with square bracket method : 0.10936307907104492 seconds

list_of_data = [{'id': 99,
             'rocketship': {'price': [10, 10, 10, 10, 10],
                            'ytd': [1, 1, 1.05, 1.1, 1.18]}},
            {'id': 898,
             'rocketship': {'price': [10, 10, 10, 10, 10],
                            'ytd': [1, 1, 1.05, 1.1, 1.18]}},
            {'id': 903,
             'rocketship': {'price': [20, 20, 20, 10, 10],
                            'ytd': [1, 1, 1.05, 1.1, 1.18]}},
            {'id': 999,
             'rocketship': {'price': [20, 20, 20, 10, 10],
                            'ytd': [1, 3, 4.05, 1.1, 1.18]}},
            ]
price = []
ytd = []
for i in list_of_data:
    price.extend(i['rocketship']['price'])
    ytd.extend(i['rocketship']['ytd'])
print(price)
print(ytd)

>>> [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 10, 10, 20, 20, 20, 10, 10]
>>> [1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 3, 4.05, 1.1, 1.18]
Sign up to request clarification or add additional context in comments.

7 Comments

@Anjat This method is quite fast. List comprehension is not a panacea. On my machine, with list_of_data * 10000, this code took only 0.02 seconds. Ankit's method tood 54 seconds, Khanh's method 51 seconds. Mine (which I have removed) took 0.04 seconds, which is quite fast, but a little bit slower than this. Also this is very readable.
@j1-lee Thanks for performance test. is list comprehension is always faster than for loops?
List comprehension puts everything in memory. I think that's why Khanh used a generator expression. I'm surprised there wasn't much difference between Khanh vs Ankit's method. I wonder where the slow down is in Khan's code.
@Ramesh the .get() is slowing you down. If you change it to price.extend(i['rocketship']['price']) I think it's twice as fast.
@shawncaza Thanks! i never know that square bracket method is faster than get method.
|
2

Using list comprehension:

price, ytd = [i for item in list_of_data for i in item["rocketship"]["price"]],
             [i for item in list_of_data for i in item["rocketship"]["ytd"]]

Output

price: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 10, 10, 20, 20, 20, 10, 10] 

ytd: [1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 3, 4.05, 1.1, 1.18]

2 Comments

This might be the best answer. Possibly the fastest, and the easiest to read. You could even separate the list assignments to seperate lines to improve readability without any performance lose.
I just thought of keeping it the same way as Anjat wanted. Otherwise, I agree with you.
2

I traded a bit of readability for performance here

import itertools

tuples = ((item['rocketship']['price'], item['rocketship']['ytd']) for item in list_of_data)
price, ytd = functools.reduce(lambda a, b: (a[0] + b[0], a[1] + b[1]), tuples, ([], []))

I tried to keep things in a single loop and use generator to optimize memory use. But if the data is big, the resulting price and ytd are also big too, hopefully you thought about that already.

Update:

Thanks to @j1-lee's performance test, I redo the code again as follow:

import functools


def extend_list(a, b):
    a.extend(b)
    return a


tuples = ((item['rocketship']['price'], item['rocketship']['ytd'])
          for item in list_of_data)
price, ytd = map(
    list,
    functools.reduce(
        lambda a, b: (extend_list(a[0], b[0]), extend_list(a[1], b[1])),
        tuples,
        ([], [])
    )
)

This reduce the execution time from 45.556s to 0.096s. My best guess would be when you use + operator, it would create a new list from 2 old list, which requires copying them over a new one, so it will go as:

list(4) + list(4) = list(8)  # 8 copies
list(8) + list(4) = list(12)  # 12 copies
list(12) + list(4) = list(16)  # 16 copies
...

Using .extend() would only need to copy the new additional list into the old one, so it should be faster

list(4).extend(list(4)) = list(8)  # 4 copies
list(8).extend(list(4)) = list(12)  # 4 copies
list(12).extend(list(4)) = list(16)  # 4 copies
...

It would be better if someone can point to the specific documentation or information though.

3 Comments

The lambda seems to be what takes the most time with your new answer.
I tried the craziest optimizations I could think of but nothing beat the speed (and simplicity) of @ramesh's answer so hats off to him 😀
Take a look at Abdo's answer. I think it might be the winner. I added my speed comparisons to my answer.
1

Perform a list comprehension and flatten your result.

ytd = sum([d['rocketship']['ytd'] for d in list_of_data], [])
price = sum([d['rocketship']['price'] for d in list_of_data], [])

2 Comments

This looks good. Since OP is looking for performance when working with large lists, I'm wondering if there is any benefit to making it a generator expression rather than list comprehension?
generator could look like this: price_generator = (d['rocketship']['price'] for d in list_of_data). The flattening it into a list with: price = list(itertools.chain.from_iterable(price)) but ultimately we're going through a large list twice which makes Khanh Luong answer attractive.
1

Instead of passing the list function in your map, you could pass itertools.chain.from_iterable to merge all the individual lists. Then you can run the list() after to transform the generator into a list

import itertools
price_gen, ytd_gen = map(itertools.chain.from_iterable ,zip(*((i['rocketship']['price'], i['rocketship']['ytd']) for i in list_of_data)))

price = list(price_gen)
ytd = list(ytd_gen)

However, creating seperate generators for each dataset actually seems to be much faster. ~7x faster in my test.

import itertools
price_gen = itertools.chain.from_iterable(d['rocketship']['price'] for d in list_of_data)
ytd_gen = itertools.chain.from_iterable(d['rocketship']['ytd'] for d in list_of_data)

price = list(price_gen)
ytd = list(ytd_gen)

Maybe it's the zip that slows things down?

cProfile comparison using the small original dataset looping the task 99,999 times using different solutions presented in this post:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    99999    0.132    0.000    1.344    0.000 (opt_khanh)
    99999    0.469    0.000    0.714    0.000 (opt_shawn)
    99999    0.142    0.000    0.535    0.000 (opt_Jaeyoon)
    99999    0.267    0.000    0.413    0.000 (opt_ramesh)
    99999    0.076    0.000    0.399    0.000 (opt_abdo)

Comments

0

I try to use a double comprehension. I don't know it's a good idea as it could hurt code readibility, maybe.

price = [
    item
    for sublist in [rocket["rocketship"]["price"] for rocket in list_of_data]
    for item in sublist
]

ytd = [
    item
    for sublist in [rocket["rocketship"]["ytd"] for rocket in list_of_data]
    for item in sublist
]

print(price)
print(ytd)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.