1

Sorry another newbie query :| To build upon the suggestion which was given here, optimizing

I need to be able to incrementally build a dictionary i.e. one key: value at a time inside a for loop. To be specific, the dictionary would look something like (N keys, with each value being a list of lists. The smaller inner list has 3 elements):

dic_score ={key1:[ [,,], [,,], [,,] ...[,,] ], key2:[ [,,], [,,], [,,] ..[,,] ] ..keyN:[[,,], [,,], [,,] ..[,,]]}

This dic is being generated from the following paradigm, a nested for loop.

for Gnodes in G.nodes()       # Gnodes iterates over 10000 values 
    Gvalue = someoperation(Gnodes)
    for Hnodes in H.nodes()   # Hnodes iterates over 10000 values 
        Hvalue =someoperation(Hnodes)
        score = SomeOperation on (Gvalue,Hvalue)
        dic_score.setdefault(Gnodes,[]).append([Hnodes, score, -1 ])

I then need to sort these lists, but the answer for that was given here, optimizing (use of generator expression in place of the inner loop is an option)
[Note that the dic would contain 10000 keys with each key associated with a 10000 elements of smaller lists]

Since the loop counters are big, the dictionary generated is huge and I am running out of memory.

How can I write the write the Key:value (a list of lists) as soon as it is generated to a file, so that I don't need to hold the entire dictionary in memory. I then want to be able to read back the dictionary in the same format i.e. something like dic_score_after_reading[key], returns me the list of list I am looking for.

I am hopping that doing this writing and reading per key:value would considerably ease the memory requirements. Is there a better data structure to do this? Shall I be considering a database , probably like Buzhug, which would give me the flexibility to access and iterate over lists associated with each key ?

I am currently using cPickle to dump the entire dictionary and then reading it back via load(), but cPickle crashes while dumping such a big data in one go.

Apologies, but I am unaware of the best practices to do this type of stuff. Thanks !

1 Answer 1

1

You could look into using the ZODB in combination with the included BTrees implementation.

What that gives is a mapping-like structure that writes individual entries separately to the object store. You'd need to use savepoints or plain transactions to flush data out to the storage, but you can handle huge amounts of data this way.

Sign up to request clarification or add additional context in comments.

9 Comments

Thanks for the quick response ! Let me digg into this as I am not aware on how these work.
@Martjin : I tried to implement the ZODB in combination with Btree, as u suggested, but I am running into a small issue. When I open the database for reading , I find that some of the records are missing. Specifically, as mentioned above I store my dic. in the form of a Btree and attach it to the root. Btree contains say 20 keys with each key corresponding to list of list ( as written in the original question ). However, some list elements go missing when I open the data base again , after having closed it once. I am using transaction.commit() after every time I append to the key.
in the inner loop, after every iteration,I do btree.setdefault(key,[]).append([ref_node, score, -1]) ; transaction.commit() , so that the list associated with each key build incrementally. I see the full list when I am done, but it's when I reopen that I find missing elements in the list of list[refer original post]. Any ideas/suggestions as to what might be going wrong ? Thanks in advance !
Ah, you use lists; these are mutable and the ZODB doesn't notice the changes and won't commit those. See ZODB not able to commit , zodb.org/documentation/guide/…, zodb.org/documentation/articles/ZODB1.html#mutable-attributes, and zodb.org/documentation/guide/modules.html#PersistentList.
Thanks for the response ! I introduced the command root[0]._p_changed = 1 before transaction.commit(), which worked when I had small number of keys ( 20-30). As I jumped to more keys, >100, again I am observing the same issue, that in some of the keys , some list elements are missing. Any other suggestion as to whats going on here ?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.