8

I want to create an object in python that is a collection of around 200,000,000 true/false values. So that I can most effectively change or recall any given true/false value, so that I can quickly determine if any given number, like 123,456,000 is true or false or change its value.

Is the best way to do this a list? or an array? or a class? or just a long int using bit operations? or something else?

I'm a bit noob so you may have to spell things out for me more than if I were asking the question in one of the other languages I know better. Please give me examples of how operating on this object would look.

Thanks

1
  • 3
    Are the true/false values dense or sparse? Are they evenly distributed, or are their likely to be denser and sparser ranges? The ideal choice of data structure differs depending on these. Of course you may not need "ideal". Commented Oct 16, 2009 at 19:47

6 Answers 6

13

You can try the bitarray module, or write a similar thing using an array of integers yourself.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! I've never installed a Module before and am having a bit of trouble: superuser.com/questions/56316/…
4

"quickly determine if any given number, like 123,456,000 is" in the "true" set or "false" set.

This is what a set is for.

The "true" set is a set of all the numbers.

To make a number's boolean flag "true", add it to the true set.

To make a number's boolean flag "false", remove it from the true set.

Life will be much simpler.

5 Comments

+1: it is simple to use and might be sufficiently efficient for a sparse list
Let's say half of the values are true. Size of the int object is 12 bytes, that's 1.2GB just for storing the keys + additional memory for the actual hash table. Using a bit array, the memory usage will be 25MB. I think that's a significant difference.
@Lukáš Lalinský: Your analysis is good. However, I don't think it's relevant unless your processor has no memory available. On most modern processors, there's plenty of memory and the 25M vs. 1.2G doesn't really matter very much at all.
Well, I have trouble calling wasting 1GB of RAM the best way to do something. :) Sure, it's a simple way and it works well if you have much less true values than false values, but it's not the best way.
Since the RAM is not a consumable resource, I can't apply "waste" to it -- unless you're tearing it out of the processor and throwing it away. Memory is a resource, like time that is part of the trade-off equation. Memory -- in this case -- may be used so that less time is taken trying to manage individual bits in a bit array.
3

Have you considered using a lightweight database like SQLite?

1 Comment

+1. 200 million bits is about 24 megabytes of data -- while this could easily fit in memory on a modern machine, anytime you get to that size of structure in memory, you probably need to at least consider whether a database would be a better solution.
3

You might also like to try the bitstring module, which is pure Python. Internally it's all stored as a byte array and the bit masking and shifting is done for you:

from bitstring import BitArray
# Initialise with two hundred million zero bits
s = BitArray(200000000)
# Set a few bits to 1
s.set(1, [76, 33, 123456000])
# And test them
if s.all([33, 76, 123456000]):
    pass

The other posters are correct though that a simple set might be a better solution to your particular problem.

Comments

1

At first glance, the Python BitVector module sounds like it does exactly what you want. It's available at http://cobweb.ecn.purdue.edu/~kak/dist/BitVector-1.5.1.html and since it is pure Python code, it will run on any platform with no compiling needed.

You mentioned that you need some speed in getting and setting any arbitrary true-false value. For that you need to use a Python array, rather than a list, and if you go to the above URL and browse the source code for BitVector you can see that it does indeed rely on Python arrays.

Ideally, you would encapsulate what you are doing in a class that subclasses from BitVector, i.e.

class TFValues(BitVector):
   pass

That way you can do things like add a list to contain associated info such as the name of a particular TF value.

Comments

0

If the bits which are set are mostly consecutive, there's also the option to store a list of ranges, e.g. the PyPI module https://pypi.org/project/range_set/ which is API compatible to Python's set class.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.