0

I've recently finished writing a genetic algorithm. The algo creates a list of 50 test cases, and runs that trial through a simulation, and stores the result. I use a number of custom classes, and the stored results contain lists of class instances.

When I run this simulation, beyond taking forever, it runs my Memory Usage up to 90+%. There must be somewhere in my code that is just hogging memory, but I'm not sure how to find it. Here is the main part of my code, with the loops inside loops inside loops.....

        # ----------------------------------------------------------------------------------------------------
    for i in range(1,numberOfRunsToSolve,1): # begin genetic algo loop
        temp = trial_cases
        for ii, stored_trial in enumerate(temp): # run through stored trial cases
            new_trials = []
            for jj in range(1,numberOfTrialsPerRound):
                tc = []
                tc = randomTrials_GenAlgo_ARRAY(stored_trial, True) # create new trial set, based on indexed stored results
                new_trials.append(tc)
            print(new_trials)

            for j, trial in enumerate(new_trials):
                x = OneGenerationSimulation(trial) #returns [ObjArray, ErrorArray]
                rev = revenueAndLoss(x[0])
                DEBUG_ARRAY.append(str(revenue)+' trial: ' + str(trial))
                results.append((revenue, trial, x[0],x[1]))


        results.sort(reverse=True)  # sort to bring best revenue to top
        results = results[0:numberOfResultsToKeepPerRound-1] 

        trial_cases = []
        for i, r in enumerate(results):
            trial_cases.append(r[1])

        # end of the genetic algo loop
    # ----------------------------------------------------------------------------------------------------

Any suggestions how to track memory usage in my script, and hunt down culprits? I'm pretty new to Python, so feel free to state the obvious.

EDIT: The process above is essentially doing this: 1) Create 50 trial runs.
2) Run a simulation on each trial. This simulation creates hundreds of custom objects, and runs a script on them, returning the results. 3) With all the results that come back, retrieve the best 5 outcomes. 4) With those 5 outcomes, create new trial sets, and repeat the process over again.

I'm worried that the mass-object instance creation, which is then filtered down to the best 5 results, isn't being cleaned up properly in memory or something... and that all those objects are hiding in the background....

Thanks - KC.

12
  • first place I'd look is recursion, but if I understand correctly, then you've already figured out that these loops are definitely the problem; is that right? Commented Feb 28, 2017 at 19:25
  • yeah- definitely. The custom object stuff may not be the most efficient either - but there isn't much code besides what you see above (i'm doing this in Juypter fwiw) Commented Feb 28, 2017 at 19:28
  • One thing I'd try is to see if those giant arrays of arrays are causing problems by trying smaller numbers (for numberOfTrialsPerRound and numberOfRunsToSolve) and seeing if that makes a big impact. I didn't see anything obviously wasteful yet though... Commented Feb 28, 2017 at 19:29
  • 2
    Where does "revenue" get set? Did you mean "rev"? I'm not sure how many results you are getting versus how many you are keeping. However, instead of getting all the results, sorting all the results and then throwing away some. What about just keeping the ones you need? I.e. do an "insertion sort" type thing. Can you get rid of DEBUG_ARRAY and see if it helps? If memory is a concern, you could see if garbage collecting helps. Also, no reason to use enumerate in your final loop, just have trial_cases = [ r[1] for r in results] Commented Feb 28, 2017 at 19:34
  • sorry - you're right. Revenue = rev. I was trying to shorten my variables in the above. Commented Feb 28, 2017 at 19:35

1 Answer 1

1

Here is a quick and dirty insertion sort you could use. Rather than a function, you could inline this code to avoid function call overhead.

def top_insert(my_list, item, keep_only = 10):
    rev = item[0]
    newlist = [ i for i in my_list if i[0] > rev] 
    if len(newlist) >= keep_only:
        return newlist[:keep_only]
    elif len(newlist) == (keep_only - 1):
        return newlist + [item]
    else:
        return newlist + [item] + my_list[len(newlist):keep_only-len(newlist)-1]
Sign up to request clarification or add additional context in comments.

2 Comments

Based on the edit, this is probably not the solution after all. Sorting 50 things wouldn't be a huge performance hit.
I think the issue is that each of those items that it's sorting contains a ton of info / uses a ton of memory. See my other question here: stackoverflow.com/questions/42517635/… Once I did a sorted insert, it kept the number of items I was storing way down

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.