Manage big data structures in python

Question

I need for some reason a four dimensional matrix in my python program which has a dimension like 10000x20000x4000x10. As I have tried to implement it using normal arrays in python I have found that it is impossible to do because of my restricted available system resources. What is the best way to manage such big data structures? Is the only way using Database?

Edit: Because it depends on what is my goal I will describe shortly what I am doing. I am trying to expand the 1 dimensional knapsack problem to 4 dimensions. There are 2 matrices keep and solution which have to be accessed. As I try to address the resource allocation problem of routers with this these 2 matrices look like this:

keep = [[[[0 for x in xrange(CORE.cap + 1)*1000]for x in xrange(RAM.cap + 1)*1000]for x in xrange(NIC.cap + 1)*1000]for x in xrange(len(JOBS) + 1)]
solution = [[[[0 for x in xrange(CORE.cap + 1)*1000]for x in xrange(RAM.cap + 1)*1000]for x in xrange(NIC.cap + 1)*1000]for x in xrange(len(JOBS) + 1)]

There are a lot of 0 in these matrices and I have to access each row of the matrices each time.

Is it a sparse matrix (i.e. only a small fraction of the cells are populated)? If so, there's hope. Otherwise, it may be too big. — ugoren
– ugoren, Commented Aug 1, 2013 at 8:50
Why do you think you need a 4D array that's that big? What are you really trying to do? There are lots of lazy/windowed big data algos. — doctorlove
– doctorlove, Commented Aug 1, 2013 at 8:51
I am expanding the knapsack algorithm from one dimension to 4. This is a little bit complicated as I try to address the resource allocation problem of routers with this algorithm. For the dynamic programing approach of knapsack I need 2 Matrices keep and solution. Although there are a lot of zeros in these matrices. If you wish I can perform you the source code. — Hossein Tav
– Hossein Tav, Commented Aug 1, 2013 at 8:59
If you get your algorithm down you can scale up and use a amazon E2C on-demand instance — AlexLordThorsen
– AlexLordThorsen, Commented Aug 1, 2013 at 9:03
As this algorithm should be able to run on commodity hardware I would prefer to make it runnable on my computer. But thanks for your advice — Hossein Tav
– Hossein Tav, Commented Aug 1, 2013 at 9:24

Hans Then · Accepted Answer · 2013-08-01 09:05:38Z

1

For 1D knapsack, you only need to keep a the last two rows of the matrix in memory. The rest you can store to disk using a run-length encoding, since most of the rows will contain the same value as the previous row. Probably, for 4D knapsack you can do something similar (like only keep a plane?) in memory and store the rest to disk.

Alternatively, you can use a branch-and-bound algorithm for knapsack or use an approximation algorithm, in which you create smaller items and knapsacks.

answered Aug 1, 2013 at 9:05

Hans Then

11.4k3 gold badges36 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Hossein Tav Over a year ago

I think it could be the solution I am looking for. But I have to check the performance of this approach as constantly reading and writing to/from file could make the algorithm very slow. I will implement it and let you know how it goes.

Hossein Tav Over a year ago

Unfortunately this approach fails in the first step as I have tried to build the initial files and fill them with 0 it takes a very long time and it has to be repeated each time I run the algorithm.

Hans Then Over a year ago

Probably you don't need to do that. You use DP, so you can just calculate the values per column. Whenever you reach a point you no longer need earlier values, you store only that column.

Hossein Tav Over a year ago

But at the end I need the whole keep matrix to find out which objects should I pick

Hans Then Over a year ago

In normal knapsack, you can trace the end solution by going back from the last row to the previous row, one row at a time. You need at most two rows in memory at the same time. In 4D knapsack, you can probably do something similar.

|

Collectives™ on Stack Overflow

Manage big data structures in python

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related