0

I have a document.gca file that contains specific information that I need, I'm trying to extract certain information, in a part of text repeats the next sentences:

#Sta/Elev= xx
(here goes pair numbers)
#Mann

This part of text repeats several times. My goal is to catch (the pair numbers) that are in that interval, and repeat this process in my text. How can I extract that? Say I have this:

Sta/Elev= 259 
   0 2186.31      .3 2186.14      .9 2185.83     1.4 2185.56     2.5 2185.23
   3 2185.04     3.6 2184.83     4.7 2184.61     5.6  2184.4     6.4 2184.17
 6.9 2183.95     7.5 2183.69     7.6 2183.59       8 2183.35     8.6 2182.92
10.2 2181.47    10.8 2181.03    11.3 2180.63    11.9 2180.27    12.4 2179.97
  13 2179.72    13.6 2179.47    14.1  2179.3    14.3 2179.21    14.7 2179.11
15.7  2178.9    17.4 2178.74    17.9 2178.65    20.1 2178.17    20.4 2178.13
20.4 2178.12    21.5 2177.94    22.6 2177.81    22.6  2177.8    22.9 2177.79
24.1 2177.78    24.4 2177.75    24.6 2177.72    24.8 2177.68    25.2 2177.54
    Mann= 3 , 0 , 0 
           0      .2       0    26.9      .2       0    46.1      .2       0
    Bank Sta=26.9,46.1
    XS Rating Curve= 0 ,0
    XS HTab Starting El and Incr=2176.01,0.3, 56 
    XS HTab Horizontal Distribution= 0 , 0 , 0 
    Exp/Cntr(USF)=0,0
    Exp/Cntr=0.3,0.1

    Type RM Length L Ch R = 1 ,2655    ,11.2,11.1,10.5
    XS GIS Cut Line=4
    858341.2470677761196439.12427935858354.9998313071196457.53292637
    858369.2753539641196470.40256485858387.8228168661196497.81690065
    Node Last Edited Time=Aug/05/2019 11:42:02
    Sta/Elev= 245 
     0 2191.01      .8 2190.54     2.5  2189.4       5 2187.76     7.2  2186.4
     8.2 2185.73     9.5 2184.74    10.1 2184.22    10.3 2184.04    10.8 2183.55
    12.8 2180.84    13.1 2180.55    13.3 2180.29    13.9 2179.56    14.2 2179.25
    14.5 2179.03    15.8 2178.18    16.4 2177.81    16.7 2177.65      17 2177.54
    17.1 2177.51    17.2 2177.48    17.5 2177.43    17.6  2177.4    17.8 2177.39
    18.3 2177.37    18.8 2177.37    19.7 2177.44      20 2177.45    20.6 2177.45
    20.7 2177.45    20.8 2177.44      21 2177.42    21.3 2177.41    21.4  2177.4
    21.7 2177.32      22 2177.26    22.1 2177.21    22.2 2177.13    22.5 2176.94
    22.6 2176.79    22.9 2176.54    23.2 2176.19    23.5 2175.88    23.9 2175.68
    24.4 2175.55    24.6 2175.54    24.8 2175.53    24.9 2175.53    25.1 2175.54
    25.7 2175.63      26 2175.71    26.3 2175.78    26.4  2175.8    26.4 2175.82
#Mann= 3 , 0 , 0 
       0      .2       0    22.9      .2       0      43      .2       0
Bank Sta=22.9,43
XS Rating Curve= 0 ,0
XS HTab Starting El and Incr=2175.68,0.3, 51 
XS HTab Horizontal Distribution= 0 , 0 , 0 
Exp/Cntr(USF)=0,0
Exp/Cntr=0.3,0.1

But I want to select the numbers between Sta/Elev and Mann and save as a pair vectors, for each Sta/Elev right now I have this:

import re

with open('a.g01','r') as file:
    file_contents = file.read()
    #print(file_contents)

try:
    found = re.search('#Sta/Elev(.+?)#Mann',file_contents).group(1)
except AttributeError:
    found = '' # apply your error handling

print(found)

found is empty and I want to catch all the numbers in interval '#Sta/Elev and #Mann'

3
  • are your two boundaries always on the same line? Commented Sep 12, 2019 at 13:49
  • No exactly it changes depending geometry and other parameters specific this is a hec ras result but i must extract values that are between Sta/Elev and Mann and this results repeats several times because there are many elevations (i think is the correct way to say) so, i need extract the numbers that are in that interval Commented Sep 12, 2019 at 13:52
  • For example i got this:\n Sta/Elev=120 \n 0 2191.01 .8 2190.54 2.5 2189.4 5 2187.76 7.2 2186.4 \n #Mann \n says other thing, again. \n Sta/Elev=121 \n 8.2 2185.73 9.5 2184.74 10.1 2184.22 10.3 2184.04 10.8 2183.55 \n #Mann \n i need extract that numbers. Commented Sep 12, 2019 at 13:53

1 Answer 1

1

The problem is in your regex, try switching

found = re.search('#Sta/Elev(.+?)#Mann',file_contents).group(1)

to

found = re.search('Sta/Elev(.*)Mann',file_contents).group(1)

output:

>>> import re
>>> file_contents = 'Sta/ElevthisisatestMann'
>>> found = re.search('Sta/Elev(.*)Mann',file_contents).group(1)
>>> print(found)
thisisatest

Edit:

For multiline matching try adding the DOTALL parameter:

found = re.search('Sta/Elev=(.*)Mann',file_contents, re.DOTALL).group(1)

It was not clear to me on what is the separating string, since they are different in your examples, but for that you can just change it in the regex expression

Sign up to request clarification or add additional context in comments.

5 Comments

i appreciate your help but its not working how i say in past comment i have this text: \n Sta/Elev=120 \n 0 2191.01 .8 2190.54 2.5 2189.4 5 2187.76 7.2 2186.4 \n #Mann \n says other thing, again. \n Sta/Elev=121 \n 8.2 2185.73 9.5 2184.74 10.1 2184.22 10.3 2184.04 10.8 2183.55 \n #Mann \n i need extract that numbers.
After the file_contents parameter add re.DOTALL parameter. And also the = after Sta/Elev, thought it wasn't part of the matching text
excuse me could you write me again the code, i don't understand yet what did you do, i appreciate it
Edited the answer
Thanks for your answer man, but i don have yet the specific values i just want the values between #Sta/Elev and #Mann could you please chek mi editing question, if you understand better what i need, thank you

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.