3

I have nested function calls where also multiprocessing is applied. izip or repeat or something seems to be making copies of objects rather than passing by reference, some packing and unpacking is also done in the meantime.

Here is the structure in order of call:

def main():
    print 'Rel_list id main: %s' % str(id(rel_list))
    par_objective(folder.num_proc, batch, r, folder.d, vocab_len, \
                                          rel_list, lambdas)

def par_objective(num_proc, data, params, d, len_voc, rel_list, lambdas):
    pool = Pool(processes=num_proc) 

    # non-data params
    oparams = [params, d, len_voc, rel_list]

    print 'Rel_list id paro: %s' % str(id(rel_list))
    result = pool.map(objective_and_grad, izip(repeat(oparams),split_data))


 def objective_and_grad(par_data):
    (params, d, len_voc, rel_list),data = par_data

    print 'Rel_list id obag: %s' % str(id(rel_list))

Output:

ID IN MAIN
Rel_list id main: 140694049352088
ID IN PAR_OBJECTIVE
Rel_list id paro: 140694049352088
IDs IN OBJECTIVE_AND_GRAD (24 Processes):
Rel_list id obag: 140694005483424
Rel_list id obag: 140694005481840
Rel_list id obag: 140694311306232
Rel_list id obag: 140694048889168
Rel_list id obag: 140694057601144
Rel_list id obag: 140694054472232
Rel_list id obag: 140694273611104
Rel_list id obag: 140693878744632
Rel_list id obag: 140693897912976
Rel_list id obag: 140693753182328
Rel_list id obag: 140694282174976
Rel_list id obag: 140693900442800
Rel_list id obag: 140694271314328
Rel_list id obag: 140694276073736
Rel_list id obag: 140694020435696
Rel_list id obag: 140693901952208
Rel_list id obag: 140694694615376
Rel_list id obag: 140694271773512
Rel_list id obag: 140693899163264
Rel_list id obag: 140694047135792
Rel_list id obag: 140694276808432
Rel_list id obag: 140694019346088
Rel_list id obag: 140693897455016
Rel_list id obag: 140694067166024
Rel_list id obag: 140694278467024
Rel_list id obag: 140694010924280
Rel_list id obag: 140694026060576

BACK TO MAIN, RINSE AND REPEAT
Rel_list id main: 140694049352088
Rel_list id paro: 140694049352088

As you can see the id for the list is the same in main() and par_obj() but changes when passed into the multiprocessing pool

multiprocessing forks in a copy-on-write manner and the list is never changed, yet the id's change, does this mean the memory is copied or just the id is changed?

Is there any way to check if memory is copied? If those are copies, why are they copied?

1 Answer 1

4

Your python objects are being modified; you are creating additional references to them, so the reference count in the object is altered and a copy is created by the OS.

Any Python object the subprocess needs to access will have to have an independent reference count from the main process. As such Python multiprocessing won't ever simply use the same memory region, there will always be a copy required.

Sign up to request clarification or add additional context in comments.

4 Comments

so because of the new references the objects are being copied? (params, d, len_voc, rel_list),data = par_data So If instead of this I only use values by indexes that should fix it?
Does it create a shallow or a deep copy?
@EliKorvigo: this is not a copy from Python, the OS copies the whole memory region.
@Kameegaming: I don't think you can avoid this; the subprocess needs to be able to refer to the objects separately from the main process. As such, a copy always has to be made.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.