3

I have b buckets 0....b-1 and m apples 0....m-1. In the beginning all apples are placed in bucket 0.

Then running some analysis causes the apples to be moved among the buckets. I have already implemented this by having a 2D list (as buckets) in which apple ids are removed and appended whenever they need to be moved between buckets. This is, however, very inefficient for my analysis as these movements are in the order of millions or billions. So, I was wondering if there is any better solution out there to implement such a structure?

By the way, the title was chosen as this is very similar to the partitions of a set problem in which no member can be placed in more than 1 subset. Here is also an example with 4 apples and 3 buckets to make it more clear:

time 0:
a=[[0,1,2,3],[],[]]
time 1: (say apple 3 needs to be moved to bucket 2)
a=[[0,1,2],[],[3]]
0

2 Answers 2

6

The problem with removing an element from a list is that it takes O(n): it takes the order of the number of elements in the list to remove that item.

You better use sets or even better a bitarray that will work in O(1).

For example:

m = 50 #the number of apples
b = 10 #the number of buckets
fls = [False]*m
a = [bitarray(fls) for _ in range(b)]
a[0] = bitarray([True]*m) #add a filled bucket at index 0

def move_apple(apple_id,from_bucket,to_bucket):
    a[from_bucket][apple_id] = False
    a[to_bucket][apple_id] = True
Sign up to request clarification or add additional context in comments.

4 Comments

[False for _ in range(m)] is overkill. For immutable objects you can do [False]*m. Using bitarray is a very nice idea.
you're welcome (that was minor). I took the liberty to slightly edit your post BTW. Removed "(with sets)" and fixed "beter" to "better". Much beter like that :)
It would be cool if you showed some timing experiments comparing the performance to a boolean list.
@juanpa.arrivillaga: If I find some time this night, I will try to work on that...
3

Just use an array where for each apple you store the bucket number?

time 0:
a=[0,0,0,0]
time 1: (say apple 3 needs to be moved to bucket 2)
a=[0,0,0,2]

1 Comment

I need the reverse structure to avoid using index later in my script. Index() is also very heavy. O(n) I assume.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.