0

I have been trying to couple my Python script with CMG reservoir simulation software for the oil and gas industry. The software generates an output file for each timestep (e.g. 0.1 day, 0.2 days etc). I am including a function that is used to read from an output data file, search for and extract certain numbers (carbon dioxide mole percentage), calculate the incremental value for each timestep, and return those values, which will be later used to couple with the software and let it perform certain command based on the CO2 mole percentage at every timestep.

The issue I am experiencing is that the function is now reading the output file from beginning to end at each timestep, which is time-consuming because at each timestep the output file gets larger, and can contain millions of lines of text. Is there a better way to improve this code, so that once it extract the correct data from a line, the next timestep it starts searching data from that line number, instead of from the beginning of the output data file? Thanks so much for any advice. Here I am attaching the code of the function:

def co2fromOUT(tStart):     
    #tStart = 2
    import os
    
    t_find = "TIME: " + str(tStart); #+ "    days"; # from column 1 to 12 in CMG input data file
    str2find_1 = "Compositions (mole fractions)";    # from column 47 to 84 in CMG input data file
    str2find_2 = "CO2"

#################################### 2 locations - This file & python_GEM one
  
    path_to_file = 'C:/Users/name/Desktop/test 1/run2/2.OUT'
    file_name = "2" + "_" + "molCO2" + ".data"   # file to save moles of CO2

    init = str(0) + "\t" + str(0) + "\t" + str(0)
    
    with open(path_to_file, 'r') as f:
        lines = f.readlines()

    # Create a new file to write CO2 moles
    co2store = open(file_name,"a+")
    if os.stat(file_name).st_size == 0:
        co2store.write(init)
    co2store.close()
    
    count = 0
    curr_molCO2 = 0
    incr_molCO2 = 0

    for i in range(0, len(lines)):   
        
        res1 = lines[i].find(t_find)  # search for t_find in line i
        
    
        if res1 == -1:   # if t_find is not found in current line
            continue
        else:
# Case for solubility
            res2 = lines[i - 12].find(str2find_1) # find str2find_1 in line i-12
                
            if res2 == -1:
                print("NOT FOUND 11 lines behind current time - Compositions (mole fractions)")
                continue
            else:

# Case for solubility                
               res3 = lines[i - 7].find(str2find_2) # find str2find_2 in line i-7
                                
            if res3 == -1:
                    print("NOT FOUND 6 lines behind current time - CO2")
                    continue
            else:

                    with open(file_name,'r') as co2stores:
                        
                        for co2line in co2stores:
                            pass
                        #last_line = lin
                                        
                        #co2line = co2stores.readlines()
                    
                    #print(co2line)
                    
                    prev_molCO2 = co2line.split("\t")
                    
                    #print(prev_molCO2)
                    #print(prev_molCO2[1])
                    
                    #lst = line[i - 24].strip().split()
                    #print( lines[i - 24][42:54] )
                    
                    curr_molCO2 = float( lines[i - 7][42:55] ) # CO2 moles appear 6 lines below str2find_1
                                       
                    incr_molCO2 = curr_molCO2 - float( prev_molCO2[1] )

                    
                                 
                    molCO2store = str(tStart) + "\t" + str(curr_molCO2) + "\t" + str(incr_molCO2)
                    co2store = open(file_name,"a+")
                    co2store.write("\n" + molCO2store)
                    co2store.close()

    return incr_molCO2, curr_molCO2

Part of the output file looks something like this:

 Stream          Cum Inj        Cum Prod
  -----          -------        --------
  Oil         0.00000E+00    0.00000E+00      bbl
  Gas         9.37050E+07    6.24700E+07     cuft
 Wet Gas      0.00000E+00    6.26270E+07     cuft
  Water       9.29089E+01    2.22459E+02      bbl

                 Inj Rate      Prod Rate
  Phase           moles/d        moles/d
  -----       -----------    -----------
  Oil         0.00000E+00    0.00000E+00
  Gas         1.80444E+05    1.19860E+05
 Wet Gas      0.00000E+00    1.19860E+05
  Water       0.00000E+00    8.61632E+03

 Compositions (mole fractions) of Total Field Produced and Injected Streams:

                        Oil                           Gas                          Wet Gas
 Component    Produced       Injected       Produced       Injected       Produced       Injected
 ---------    --------       --------       --------       --------       --------       --------
      CO2    0.00000E+00   0.00000E+00     1.43080E-05   1.00000E+00     1.43080E-05   0.00000E+00
       C1    0.00000E+00   0.00000E+00     9.99986E-01   0.00000E+00     9.99986E-01   0.00000E+00
       C2    0.00000E+00   0.00000E+00     0.00000E+00   0.00000E+00     0.00000E+00   0.00000E+00
       C3    0.00000E+00   0.00000E+00     0.00000E+00   0.00000E+00     0.00000E+00   0.00000E+00
    CO2_T    0.00000E+00   0.00000E+00     0.00000E+00   0.00000E+00     0.00000E+00   0.00000E+00
1
 ***********************************************************************************************************************************
 TIME: 624.7   days                              G E M   S E C T O R   S U M M A R Y                              DATE: 1981:09:16 

The rest of the output file contains lines of text with miscellaneous data that is not important in my case.

3
  • May not be a great idea to use f.readlines() if the file contains "...millions of lines of text" Commented Jun 17, 2024 at 7:36
  • A better solution altogether is to make the software create a new file and then process the old one from start to finish. Commented Jun 17, 2024 at 7:39
  • So... what you describe as the output file is actually the input file to your code but you don't explain what the output of your code should look like. Counting back from a line that starts with "TIME:" seems like a really bad idea. Why not look for lines that have 7 whitespace delimited tokens the first of which is "Component" and move on from there? Commented Jun 17, 2024 at 7:43

2 Answers 2

0

maybe something like this:

def co2fromOUT(tStart, start_line=0):
    # stuff
    for i in range(start_line, len(lines)):
        # more stuff
    return incr_molCO2, curr_molCO2, i

at the end of your script you return i to feed it back as start_line

Sign up to request clarification or add additional context in comments.

Comments

0

I believe it would be more viable to use Python with the CMG outboard, it allows you to extract specific data.

1 Comment

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.