2

Say I have a text file with the data/string:

Dataset #1: X/Y= 5, Z=7 has been calculated
Dataset #2: X/Y= 6, Z=8 has been calculated
Dataset #10: X/Y =7, Z=9 has been calculated 

I want the output to be on a csv file as:

X/Y, X/Y, X/Y

Which should display:

5, 6, 7

Here is my current approach, I am using string.find, but I feel like this is rather difficult in solving this problem:

data = open('TestData.txt').read()
#index of string
counter = 1

if (data.find('X/Y=')==1):      
#extracts segment out of string
    line = data[r+6:r+14]
    r = data.find('X/Y=')
    counter += 1 
    print line
else: 
    r = data.find('X/Y')`enter code here`
    line = data[r+6:r+14]
    for x in range(0,counter):
    print line


print counter

Error: For some reason, I'm only getting the value of 5. when I setup a #loop, i get infinite 5's.

2 Answers 2

3

If you want the numbers and your txt file is formatted like the first two lines i.e X/Y= 6, not like X/Y =7:

import re
result=[]
with open("TestData.txt") as f:
    for line in f:
        s = re.search(r'(?<=Y=\s)\d+',line) # pattern matches up to "Y" followed by "=" and a space "\s" then a digit or digits. 
        if s: # if there is a match i.e re.search does not return None, add match to the list.
            result.append(s.group())
print result
['5', '6', '7']

To match the pattern in your comment, you should escape the period like . or you will match strings like 1.2+3 etc.. the "." has special meaning re.

So re.search(r'(?<=Counting Numbers =\s)\d\.\d\.\d',s).group() will return only 1.2.3

If it makes it more explicit, you can use s=re.search(r'(?<=X/Y=\s)\d+',line) using the full X/Y=\s pattern.

Using the original line in your comment and updated line would return :

['5', '6', '7', '5', '5']

The (?<=Y=\s)is called a positive lookbehind assertion.

(?<=...)

Matches if the current position in the string is preceded by a match for ... that ends at the current position

There are lots of nice examples here in the re documentation. The items in the parens are not returned.

Sign up to request clarification or add additional context in comments.

10 Comments

So how would this work for a more complex text file? Say that I now have an extra Y in my text file: Dataset #1: Y=2, X/Y= 5, Z=7 has been calculated
I added the output, what comes before is irrelevant, all we are looking for is Y= 5, Yes does not match our pattern
Sorry, I just updated my comment, I get the arguments now. Thanks a lot for clearing that up with me. Would you be able to observe the updated comment? Say I have Dataset #1: Y= 2, X/Y= 5 has been calculated. But I want to only have the value of X/Y= 5
@user3685687, again it will work as the pattern has to have Y= 5.
Is there a way to make the Pattern X/Y= ?
|
1

Since it appears that the entities are all on a single line, I would recommend using readline in a loop to read the file line-by-line and then using a regex to parse out the components you're looking for from that line.

Edit re: OP's comment:

One regex pattern that could be used to capture the number given the specified format in this case would be: X/Y\s*=\s*(.+),

2 Comments

When using regex to parse out the components, what exactly would you use for the regex arguments? r= re.compile("^X/Y=|,$")?
Got it! That helped clear up the question I had in the bottom solution. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.