2

I need for some reason a four dimensional matrix in my python program which has a dimension like 10000x20000x4000x10. As I have tried to implement it using normal arrays in python I have found that it is impossible to do because of my restricted available system resources. What is the best way to manage such big data structures? Is the only way using Database?

Edit: Because it depends on what is my goal I will describe shortly what I am doing. I am trying to expand the 1 dimensional knapsack problem to 4 dimensions. There are 2 matrices keep and solution which have to be accessed. As I try to address the resource allocation problem of routers with this these 2 matrices look like this:

keep = [[[[0 for x in xrange(CORE.cap + 1)*1000]for x in xrange(RAM.cap + 1)*1000]for x in xrange(NIC.cap + 1)*1000]for x in xrange(len(JOBS) + 1)]
solution = [[[[0 for x in xrange(CORE.cap + 1)*1000]for x in xrange(RAM.cap + 1)*1000]for x in xrange(NIC.cap + 1)*1000]for x in xrange(len(JOBS) + 1)]

There are a lot of 0 in these matrices and I have to access each row of the matrices each time.

5
  • Is it a sparse matrix (i.e. only a small fraction of the cells are populated)? If so, there's hope. Otherwise, it may be too big. Commented Aug 1, 2013 at 8:50
  • Why do you think you need a 4D array that's that big? What are you really trying to do? There are lots of lazy/windowed big data algos. Commented Aug 1, 2013 at 8:51
  • I am expanding the knapsack algorithm from one dimension to 4. This is a little bit complicated as I try to address the resource allocation problem of routers with this algorithm. For the dynamic programing approach of knapsack I need 2 Matrices keep and solution. Although there are a lot of zeros in these matrices. If you wish I can perform you the source code. Commented Aug 1, 2013 at 8:59
  • If you get your algorithm down you can scale up and use a amazon E2C on-demand instance Commented Aug 1, 2013 at 9:03
  • As this algorithm should be able to run on commodity hardware I would prefer to make it runnable on my computer. But thanks for your advice Commented Aug 1, 2013 at 9:24

1 Answer 1

1

For 1D knapsack, you only need to keep a the last two rows of the matrix in memory. The rest you can store to disk using a run-length encoding, since most of the rows will contain the same value as the previous row. Probably, for 4D knapsack you can do something similar (like only keep a plane?) in memory and store the rest to disk.

Alternatively, you can use a branch-and-bound algorithm for knapsack or use an approximation algorithm, in which you create smaller items and knapsacks.

Sign up to request clarification or add additional context in comments.

7 Comments

I think it could be the solution I am looking for. But I have to check the performance of this approach as constantly reading and writing to/from file could make the algorithm very slow. I will implement it and let you know how it goes.
Unfortunately this approach fails in the first step as I have tried to build the initial files and fill them with 0 it takes a very long time and it has to be repeated each time I run the algorithm.
Probably you don't need to do that. You use DP, so you can just calculate the values per column. Whenever you reach a point you no longer need earlier values, you store only that column.
But at the end I need the whole keep matrix to find out which objects should I pick
In normal knapsack, you can trace the end solution by going back from the last row to the previous row, one row at a time. You need at most two rows in memory at the same time. In 4D knapsack, you can probably do something similar.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.